How could users USE this data?
What system/model do we need to fit into (we already have the resilience.io system)
Characteristics of the data
Majority of the data is geospatial - but in different forms.
Well structured and stored
- Polygons: administrative information, travel data
- Points: air quality sensor stations
- Grids: population???
How to harmonise on a particular geospatial type? Do we need to?
Some data is also at a coarser level of granularity
Other variations are that the data is released at coarser TEMPORAL granularity
- Hospital data
How could you achieve better granularity (may need to ask the hospital)?
Or give them the software? Or have some arrangement with them?
- How does it get connected to metadata (data dictionary/is it web-enabled?/
- What do we need to know - before we get the data? When we want to use the data?
How could the metadata be improved?
- Add value labels/code lists (Include relevant lookups for those lists ß this could be a database, or a URN-URI)
- Need temporal and spatial coverage and granularity
- Add geospatial enhancements (e.g. some datasets had NAMES of geographic areas but not codes)
NOTE-- the resilience.io Data Collection Strategy document outlines metadata requirements that are aligned with many of the items we discussed. See items 1. Data screening & collection and 2. Raw data collection.
- Found what appeared to be the ICD codes in the hospital data?
- Peter looked up the codes online?
- We figured out what the code meant in a matter of 5 minutes
- This could be automated (or semi-automated)
Implications for the other pilot projects
- The issues above probably apply to IDDO project
- Does it apply for the Sendai project?
- Example would be Spatial Data on the Web best practice guide. Also World Bank IHSN and Microdata catalogue?
- Question - How could you implement this in locations with varying levels of capability?
Implications for data integration generally
- Spatial and temporal
- Where does scalability become an issue - what resources do you have access to?
- WHat are the data pipelines that you are leveraging (or not)
- What architecture will be required?
- Resolution services for identifying data types - Bioportal as an example?
- Accessibility considerations - privacy, gatekeeping, time and resources required