Simon's worrying
Discovery - Google Dataset Search will become go-to, but is (currently) very shallow
- little or no content from stats agencies ... because it is not (yet) visible in the catalogs harvested by Google
- licensing/privacy constraints ...
- metadata must support access-restrictions - dct:accessRights, dct:license
- some from 3rd parties, VARs, or secondary sources (e.g. unemployment data from biodiversity service!)
- licensing/privacy constraints ...
- standardisation of thematic/semantic content
- (In principle) is supported by e.g. dcat:theme, sosa:observedProperty, sosa:hasFeatureOfInterest, sosa:usedProcedure, sosa:madeBySensor
- supported by controlled vocabularies/registers
API access
- Search record direct links to a landing page, no direct connection to data
- conventions for links from landing page to data?
- maybe add (dcat:)accessURL, (dcat:)downloadURL to link-relations registry ... how are these related to describes ?
- conventions for links from landing page to data?
- information about format is rare (let alone schema!)
- (In principle) is supported by dct:type ... dct:conformsTo ... dcat:mediaType dcat:endPointDescription but not yet used broadly
"Are these datasets broadly compatible & relevant?"
- mechanics for cross-domain data harmonization?
- most/best information is usually in the dataset abstract i.e. text
- but ... abstract is always written for a specific audience
- are these datasets really describing the same thing?
- links to standard terminology - controlled vocabularies
Pilot - infectious diseases - availability of data is general scarce. Availability of data with metadata is even scarcer ...
Business case for dataset- and catalog-owners (i.e. a subclass of webmasters) to make better metadata?
- Google listing and high ranking
- More data access by users