Paul Natsuo Kishimoto

Structuring data with SDMX

This post expands on some text from the README of iam-units, in order to illustrate some improved ways to structure data and metadata. These are especially helpful in systems research, where data from multiple disciplines/domains/contexts are often combined.

iam-units is a thin wrapper around the very useful pint, that I wrote with some colleagues to handle unit conversions for data from integrated assessment models (IAMs) of energy and climate, including global warming potential (GWP) conversions between greenhouse gas (GHG) species:

In [1]: from iam_units import registry, convert_gwp
[17:30:03] WARNING  Redefining 'kt' (<class                      registry.py:541
                    'pint …

more ...

Handling country codes

In research with global scope and country- or country-group resolution, it's common to handle data with one or more [1] dimension(s) identifying a country (countries) for each observation. Problems can arise when inconsistent identifiers—“United States” vs. “United States of America”—are used to label this dimension, either across different data sets, or within one data set.

The best precaution against these problems is to convert idiosyncratic identifiers to short, standard ones, as soon as possible. ISO 3166 alpha-2 or alpha-3 codes (CA or CAN for Canada) are a natural choice for standard identifiers. [2]

In this post, I …

more ...