A network-based method to harmonize data classifications

Dario Diodato (2018), Papers in Evolutionary Economic Geography #18.43

A frequent problem in research is the harmonization of data to a common classification, whether that is in terms of — to name a few examples — industries, commodities, occupations, or geographical areas. Statistical offices often provide concordance tables, to match data through time or with different classifications, but these concordance tables alone are often not sufficient to define a clear methodology on how the matching should be performed. In fact, the concordance tables have, in numerous occasions, a many-to-many mapping of classifications. The issue is exacerbated when two or more concordance tables are concatenated.
In this Jupyter notebook, I discuss a network-based abstraction of this problem and propose, as a general solution, a method that identifies the network components (or the network communities) to make data converge to a new classification. The method simplifies the issue and reduces greatly conversion errors.

EUREGIO: The construction of a global IO database with regional detail for Europe for 2000-2010

Mark Thissen, Maureen Lankhuizen, Frank van Oort, Bart Los, Dario Diodato (2018), Tinbergen Institute Discussion Paper TI 2018-084/VI

This paper introduces the EUREGIO database: the first time-series (annual, 2000-2010) of global IO tables with regional detail for the entire large trading bloc of the European Union. The construction of this database, which allows for regional analysis at the level of so-called NUTS2 regions, is presented in detail for its methodology and applications. The tables merge data from WIOD (the 2013 release) with, regional economic accounts, and interregional trade estimates developed by PBL Netherlands Environmental Assessment Agency, complemented with survey-based regional input-output data for a limited number of countries.

Integration and Convergence in Regional Europe: European Regional Trade Flows from 2000 to 2010

Mark Thissen, Dario Diodato, and Frank van Oort (2013), PBL Netherlands Environmental Assessment Agency n.1036

Policy research analysing Europe’s recent focus on place-based development (Barca, 2009) and the regional smart specialisation perspective (McCann and Ortega-Argilés, 2011) has been hampered by data deficiencies. This is particularly the case for empirical evidence on interregional relations that are central in these new policy initiatives, which are based on a systems way of thinking about innovation and growth. As a solution to this problem, we propose the development of an up-to-date data set that meets certain requirements. The resulting bi-regional panel data set describes the most likely trade flows between European regions, given all the available information, and is consistent with national accounts over the 2000–2010 period.

Integrated Regional Europe: European Regional Trade Flows in 2000

Mark Thissen, Dario Diodato, and Frank van Oort (2013), PBL Netherlands Environmental Assessment Agency n.1035

This paper proposes a new methodology to determine interregional trade and present a unique data set on trade between 256 European NUTS2 regions, for the year 2000. The methodology stays close to a parameter-free approach as proposed by Simini et al. (2012), and deviates therefore from earlier methods based on the gravity model that suffer from analytical inconsistencies. Unlike a gravity model estimation, our methodology stays as close as possible to observed data without imposing any geographical trade patterns.