A network-based method to harmonize data classifications

Dario Diodato (2018), Papers in Evolutionary Economic Geography #18.43

A frequent problem in research is the harmonization of data to a common classification, whether that is in terms of — to name a few examples — industries, commodities, occupations, or geographical areas. Statistical offices often provide concordance tables, to match data through time or with different classification, but these concordance tables alone are often not sufficient to define a clear methodology on how the matching should be performed. In fact, the concordance tables have, in numerous occasions, a many-to-many mapping of classifications. The issue is exacerbated when two or more concordance tables are concatenated.
In this Jupyter notebook, I discuss a network-based abstraction of this problem and propose, as a general solution, a method that identifies the network components (or the network communities) to make data converge to a new classification. The method simplifies the issue and reduces greatly conversion errors.

EUREGIO: The construction of a global IO database with regional detail for Europe for 2000-2010

Mark Thissen, Maureen Lankhuizen, Frank van Oort, Bart Los, Dario Diodato (2018), Tinbergen Institute Discussion Paper TI 2018-084/VI

This paper introduces the EUREGIO database: the first time-series (annual, 2000-2010) of global IO tables with regional detail for the entire large trading bloc of the European Union. The construction of this database, which allows for regional analysis at the level of so-called NUTS2 regions, is presented in detail for its methodology and applications. The tables merge data from WIOD (the 2013 release) with, regional economic accounts, and interregional trade estimates developed by PBL Netherlands Environmental Assessment Agency, complemented with survey-based regional input-output data for a limited number of countries.