CARMA Notes: Methodology
Perhaps the most common question from CARMA users is: “Where do the figures on the site come from?” There is a brief answer to this question in the site’s FAQ section. A detailed CARMA v3.0 technical paper provides a complete answer to that question. This blog post aims to provide something in-between: sufficient detail for the average user, but not too much.
In an ideal world, the electricity generation, CO2 emissions, location, and ownership of the world’s power plants would be regularly published by the appropriate national authorities. Of course, this is not the case. In fact, for the vast majority of countries it is difficult (if not impossible) to find any comprehensive, public information about state of power generators, never mind their environmental impact.
At present, only about 10% of the world’s CO2-emitting power plants regularly disclose CO2 emissions through public databases. These plants are limited to the United States, European Union, Canada, India, and South Africa. Collectively, these databases disclose the specific source of about 35% of global power sector CO2 emissions. A database maintained by the International Atomic Energy Agency also discloses the electricity production of nuclear power plants worldwide. Some databases, like in the U.S., India, and South Africa, report both electricity generation and emissions. Others report only emissions. Some have corporate data, some do not. Some report the location of plants, some do not. Some are exhaustive (covering all facilities in their jurisdiction), some are not. Outside of these sources, information about plant-specific performance is fragmented, privately-held, or non-existent. In short, it’s kind of a mess.
CARMA’s basic task is to consolidate the public data that is made available and come up with reasonable estimates for the rest. A private, commercial database maintained by Platts, Inc. provides valuable information about the location, engineering, fuel type, and ownership of effectively all of the world’s generating units (though it reveals nothing about actual generation or emissions). This database provides a basis for knowing which plants are reported publicly and which are undisclosed and in need of estimates. It also provides variables that can be used to predict the performance of a given plant.
Electricity generation and emissions for undisclosed plants are estimated using statistical models. The U.S. Department of Energy and Environmental Protection Agency publish detailed information about almost all power plants in the U.S. It is possible to process this data to determine what is happening at individual units in particular months. From this data, CARMA constructs a large, detailed dataset of unit-level, monthly performance at U.S. facilities (electricity generation, CO2 emissions, fuel type and consumption, etc.). This dataset is used to fit statistical models that predict how much electricity or CO2 a plant is likely to produce given its size, age, the various technologies and fuels in use, the nature of the electricity grid, etc. The resulting models are then applied to the global database provided by Platts, Inc. to derive estimated performance for power plants that lack publicly disclosed data.
Obviously, there are limitations to this approach. For example, it assumes that the experience and performance of U.S. power plants is similar to those in any other country (controlling, of course, for the various fuel and engineering characteristics that can be observed). The biggest challenge, though, is that utilization rates for plants across time are highly variable. This makes it difficult to accurately estimate the emissions of a given plant in a given year. While CARMA will always have difficulty precisely predicting the performance of a given plant in a given year, it does do a a few things well:
First, and most obviously, it consolidates the high-quality information that is available. This is not a trivial task given that each national disclosure database has its own particular format, standards, and (annoying) idiosyncrasies. And the national databases alone do not provide all the desired data points, which means they must be painstakingly matched against the Platts, Inc. database (and others) to extract the full suite of required information.
Second, even when disclosed data is unavailable, CARMA’s statistical models do a decent job of estimating the amount of CO2 a given plant emits for each MWh of electricity produced (called “Intensity” on the site and given units of kgCO2/MWh). In some ways, the carbon intensity is the most important metric, since it allows us to identify those power plants that are the greatest relative threat in terms of climate change.
Third, even if CARMA’s models cannot precisely estimate total electricity generation or emissions for a given plant and year, the model output is likely to be indicative of the long-term performance of a plant. In other words, CARMA’s models still do a reasonable job of identifying a plant’s typical or average emissions over a longer period, even if the performance for any given year is over- or under-estimated.
Ultimately, CARMA is a mix of the ideal (disclosed data) and the imperfect (estimated data). The hope is that, over time, better disclosure efforts will tip the balance in favor of the former. Users interested in the U.S. will be happy to know that CARMA’s U.S. power plant data come from the DoE and EPA and can be considered high-quality. For facilities outside the U.S., it is possible to check the disclosure status of a given plant by downloading the associated .csv file from the site and finding the “dis” variable in the output. This variable indicates one of the following situations:
dis=0: No data disclosed
dis=1: Electricity generation disclosed
dis=2: CO2 emissions disclosed
dis=3: Electricity generation and CO2 emissions disclosed