Clio Infrastructure

Views

From Clio Infrastructure

Jump to: navigation, search

Data harmonization

The central hub solves the problem that, within the individual hubs, data are collected on different levels of aggregation, on different subjects, and often requiring different data structures. A generic model will be developed to enable cross-linking between the different collections of datasets, with respect to the differences in subject (demographic, social and economic features), time period (year) and geographic level (micro, meso and macro).

For this purpose, the data will be enriched with appropriate geographical (country) and occupational codes (according to the global coding scheme of historical occupations HISCO), which is continually expanded and maintained by University of Utrecht and IISH.

Data harmonization needs to be done in such a way that the data receive a uniform structure and format, so that they can be used in the Gapminder and Statplanet visualization tools and can be downloaded and processed by statistical software.

The pre-processing will take place in the distributed hubs. In order to have a clear understanding and division of work, we will define, specify and agree with all partners involved in the project the technical and organisational framework of the CLIO-hubs in order to ensure adherence to agreed protocols, standards, roles and responsibilities.