Outline of Resilience.IO working paper
Background
How could users USE this data?
What system/model do we need to fit into (we already have the resilience.io system)
Characteristics of the data
Majority of the data is geospatial - but in different forms.
Well structured and stored
Geospatial:
- Polygons: administrative information, travel data
- Points: air quality sensor stations
- Grids: population???
How to harmonise on a particular geospatial type? Do we need to?
Some data is also at a coarser level of granularity
Other variations are that the data is released at coarser TEMPORAL granularity
- Hospital data
How could you achieve better granularity (may need to ask the hospital)?
Or give them the software? Or have some arrangement with them?
What are the implications for the metadata?
- How does it get connected to metadata (data dictionary/is it web-enabled?/
- What do we need to know - before we get the data? When we want to use the data?
How could the metadata be improved?
- Add value labels/code lists (Include relevant lookups for those lists ß this could be a database, or a URN-URI)
- Need temporal and spatial coverage and granularity
- Add geospatial enhancements (e.g. some datasets had NAMES of geographic areas but not codes)
NOTE-- the resilience.io Data Collection Strategy document outlines metadata requirements that are aligned with many of the items we discussed. See items 1. Data screening & collection and 2. Raw data collection.
How could the metadata be automated?
- Found what appeared to be the ICD codes in the hospital data?
- Peter looked up the codes online?
- We figured out what the code meant in a matter of 5 minutes
- This could be automated (or semi-automated)
Implications for the other pilot projects
- The issues above probably apply to IDDO project
- Does it apply for the Sendai project?
- Example would be Spatial Data on the Web best practice guide. Also World Bank IHSN and Microdata catalogue?
- Question - How could you implement this in locations with varying levels of capability?
Implications for data integration generally
- Spatial and temporal
- Where does scalability become an issue - what resources do you have access to?
- WHat are the data pipelines that you are leveraging (or not)
- What architecture will be required?
- Resolution services for identifying data types - Bioportal as an example?
- Accessibility considerations - privacy, gatekeeping, time and resources required