CARMA Notes: Geographic Data
CARMA v3.0 greatly improves both and scope and quality of geographic information provided for individual power plants.
Basic information like country, state/province, and city comes from a proprietary, commercial database of global power plants. Similar data is provided for U.S. facilities by the Department of Energy (DoE). This raw data is processed with an algorithm that cleans and standardizes the data and conducts a “fuzzy string” match against the open-source GeoNames place names database. The algorithm attempts to extract maximum geographic information; in some cases, it is able to add information not found in the raw data. I believe this makes CARMA v3.0 probably the most extensive public geocoding of global power plants to date.
CARMA’s geocoding algorithm attempts to return the continent, country, state/province, county/district, city, and postal code for each plant. Data coverage is universal for continent and country and nearly so (>95%) for state/province. A further 80% of plants have been assigned a city, 40% a county/district (i.e. secondary region), and ~16% a unique postal code.
CARMA users are often interested in pin-pointing the location of facilities, usually for the purposes of modeling pollutant dispersal or making high-quality maps. This requires specific geographic coordinates. Coordinate data from the DoE and EPA are used to provide high-resolution coordinates for all plants in the U.S. Outside the U.S., the same datasets that disclose emissions or power generation sometimes report coordinates, too. In addition, many large facilities have been manually geocoded using public sources (usually Wikipedia). All told, 12% of facilities responsible for about about 40% of current electricity and emissions are assigned high-resolution coordinates.
When high-resolution coordinates are not available, CARMA v3.0 provides the coordinates for the associated city center, as given by GeoNames. An additional 70% of plants are assigned these approximate coordinates. Comparison of approximate and precise coordinates for plants with both suggest that the approximate coordinates have an average spatial error of about 7 km. When downloading a .csv file from CARMA.org, a variable called “crd” is included to indicate if the given coordinates are approximate (crd=1) or precise (crd=2).
The CARMA website reports aggregate totals for the geographic entities previously mentioned, as well as counties, congressional districts, and metro areas for the U.S. The definition of a “metro area” has changed in v3.0 and now reflect the borders of “combined statistical areas”, as determined by the OMB. For users of CARMA’s API, it is important to note that all regions in CARMA v3.0 (excluding congressional districts and metro areas) now have unique, permanent identifiers that match those used by GeoNames. For example, the Australian state of New South Wales has region_id=2155400 (specified in the URL), which matches that used by the GeoNames API. This allows the two databases to be easily linked, if desired.
The regional totals provided in CARMA v3.0 are simply the aggregate electricity production and emissions of all geocoded facilities within the borders of the region in question. The one exception is cities. The city totals (for example, Madrid) are the aggregate of all plants with precise or approximate coordinates within 100 km (~60 miles) of the city center. CARMA v3.0 provides such totals for capital cities and those with population greater than 50,000 – more than 13,000 cities worldwide. It’s also worth noting that CARMA’s algorithms attempt to ensure accurate country totals for electricity generation and, for most countries, CO2 emissions. National totals from the DoE and International Energy Agency are used. There may be discrepancies in some cases. If you notice any, please let me know.