Group 03: DRR and Oceans
Copied from Google doc Group 03: DRR and Oceans 2022-09-02T14:13:00+02:00
Bapon (DRR), Pier Luigi (Oceans), Luis (SDGs), SimonC, SimonH, Pierre-Antoine, Milan.
Focus questions and function areas presentation.
End of workshop debrief
What have we achieved?
Initiated the development and deployment of cross-domain interoperability capacity (via JSON-LD and http://schema.org ) between the broad ocean digital ecosystem emerging under IODE and the UN Ocean Decade and the Disaster Risk and Impact domain
Developed initial strategy on advocating for LOD-native digital practices across IOC-UNESCO and UN Data, and identified priorities (e.g. Sendai indicators)
Identified (meta)data key to fitness-for-purpose in some disaster modelling applications (i.e. seismic data used as input for loss models) and included these in the specification of interoperable data exchange packages (i.e. ODIS-Arch patterns) xref Group 1.
Where are we now?
At a stage where we feel we can gather partners and attempt to implement and test multilateral (meta)data sharing to bridge our stakeholders and test CDIF as it matures/emerges
What will we do next?
ODIS/OIH team to refine and deploy drafted patterns, requesting trials with IODE Tsunami group, USGS, PDH (SPREP, SPC), UNDRR perhaps with WorldFAIR (Appendix I)
Attempt to secure reliable linked open data resources (e.g. SKOS thesauri) for
Sendai Monitoring Framework via UN Library and
UNDRR hazard typology / HIPS via CODATA
Get explicit green light - published with DOI - that any IRIs issued for these are recognised by UNDRR+ISC (who own the HIP definitions)
Check our approach and results against the maturing CDIF document and provide feedback
Bapon will lead a paper summarising our progress for potential publication in the CODATA Data Science Journal.
Discuss approaches to handle secure and sensitive data with Group 2
Please reflect on the final polished output and what it might look like (report, article) and how it will be finalised.
Abstract (in progress)
Disaster archives and loss data collection is fundamental for a comprehensive assessment of socially, temporal, and spatially disaggregated impact data. Risk interpretation, with standardized loss data, can be used in loss forecasting and historical loss modelling. These would provide valuable opportunities to acquire better information about the economic, ecological, and social cost of disasters, and to more rigorously collect data that can inform future policy, practice, and investment.
There is no coherence in disaster loss data collection and climate change loss and damage assessment. The three UN major frameworks (e.g. Sendai Framework, Paris Agreement, and SDG) generates enormous dataset (i.e. 38 indicators in Sendai Framework, 39 indicators in SDG-13) to monitor their progress, however due to organisation silos, limited local capacity, broader uncertainty surrounding climate change impacts, disaster loss data collection process and adaptation measures, lack of data integration, meta data standards, schema and interoperability, and data quality hinders the baseline information to understand the vulnerability of risk elements.
Lack of data profile
Schema
Emphasis of principle for domain
Data Structure
….
The challenge / problem statement
Difficult to find and access standardised resources needed to perform and evaluate disaster impact modelling
Lack of defined requirements for disaster risk and impact modelling in the ocean domain / for use by global infrastructure developers
Objectives
Develop a (meta)data profiles that illustrate in practical terms how to enable interoperability between the Disaster risk and impact modelling domain and the generic Ocean digital ecosystem being created under ODIS.
Align to Sendai monitoring framework and the SDG process where relevant.
Implement guidance for the ocean and disaster community to more FAIRly share (meta)data to enable rapid access and reuse of datasets, models, and web services relevant to disaster monitoring and modelling.
Results
Using tsunamis as a case study:
Develop metadata profiles for
Datasets required for disaster impact modelling
Modelling software which consume these datasets
Datasets which are the output of models, with provenance
Web services for monitoring disaster impact, etc.
This metadata profiles will provide guidance on how to improve the completeness and correctness of the semantic annotation of data products of a given domain, so that users from other domains can find the resources, understand their contents, access them, use them and re-use them in the creation of composite data products, etc.
Discussion
Cross-domain and generic standards can get us rather far
We were able to capture most of the key (meta)data required for finding, accessing, and integrating digital resources for disaster modelling. (should the sources serve it)
Note that some of the http://schema.org types and properties used here are not necessarily broadly used or supported
If it passes muster, this approach will be used by solutions such as OIH and subportals thereof.
Graph-based and atomic approaches open doors
Embedding of more domain specific content in a generic framework using URIs to resources that use domain vocabs/ontologies
Example with PROV Activity and other fields to http://schema.org Actions etc - you can mimic or mirror the same shapes between the two.
Mappings are key and non-trivial
Mapping between the generic to more regionally or domain specific conventions and standards is of vital importance to interoperability (and FAIR stuff too)
Mappings should be structured and machine-actionable
Mappings should be hosted separately (i.e. not embedded in the (meta)data being exchanged - to avoid inflation and enable re-use)
Mappings should be FAIR, and the parties responsible for creating the mapping identified.
The relationships used to map one class/term/type to another should be simple and reuse something like SKOS which is well-adopted and generic enough for most purposes.
Human readable qualifications on why a mapping is considered close/exact/narrow/etc should accompany each claim
The SSSOM conventions were marked of interest - SSSOM maps should be given PIDs to allow reuse. Example
Granularity of inputs / outputs is key
Raw, level 0 data is rarely useful or of interest to anyone except deep domain experts
It is necessary to find the right level of aggregation / processing / analysis / summary / abstraction that is consumable and (re)usable by an audience
N.B. we can identify the intended audience type in our http://schema.org patterns, not specific to an agency - like “modellers”, “sensor developers”, “dashboard developers”
Access control needs definition
Sensitive and protected data - we need to know how to design a (meta)data chunk to trigger an authorization challenge/response and receive sensitive / secure output.
Semantic markup at many levels
The http://schema.org “about” and “propertyType” properties were used to semantically qualify Datasets, Software, and also component elements thereof (e.g. units, geodetic datums) using external vocabularies that were fit for this use case’s purposes.
Official/trusted IRIs are not available for key concepts such as Sendai indicators - proposal: as done with SDGs, approach UN Library to create these - considering sending formal request from IOC and UNSD to get this done under http://metadata.un.org
The deep interoperability issue is still open
Our work allows more efficient discovery as well as sharing of some key (meta)data useful for immediate modelling (e.g., moment magnitude, spatial coverage, …), but the target datasets are mostly not interoperable or even available online.
This will take socialisation and activation of the relevant communities. We can attempt to do this via IODE Tsunami working groups and groups within the UNDRR, where standardisation efforts are already underway
TODOs
Patterns for spatial data asset interoperablity were not examined in depth. Likely these will be just a variant of our Dataset patterns but with more listed spatial coverage polygons and points OR a sub-specification of spatial data.
Check our outcomes and approach against the CDIF once it stabilises.
Conclusion
We have generated actionable, cross-domain, and broadly applicable (meta)data profiles in JSON-LD + http://schema.org which are queued for implementation in IOC-UNESCO’s ODIS Architecture (ODIS-Arch). These profiles address subtypes of datasets for data products (at
the right level of processing) needed for models, the models themselves (as software applications), the results of the models, and web services that can be used by a variety of audiences.
The shapes of the profiles are informed by and consistent with some well known ontologies, including Prov-O and SSN/SOSA for dataset metadata. Mappings to the more specialized ontologies (information models) will be provided using SSSOM.
Appendix I - ODIS-Arch JSON-LD/schema.org patterns
Live exchange and development here: New pattern: hazards and disasters · Issue #110 · iodepo/odis-arch
Appendix II - source notes and thoughts
Discussion
Bapon and PLB discuss how we prepare a data exchange space in ODIS for a disaster event, and to do so a priori so that we are not building it when the disaster comes.
ODIS has ‘patterns’ which list the requirements from a ‘community’ for data of a given type and domain. This could list what is necessary for data relating to a given disaster type.
New data-providers are encouraged to assemble a ‘community’ to reach agreement on the pattern.
Seismic data Earthquake Hazards - Data & Tools | U.S. Geological Survey
A new ODIS `pattern` will be
~75% well-known fields/slots
~25% new slots requested by the new ‘community’
E.g. request for Protected Areas - JSON-LD Pattern for Protected Areas · Issue #101 · iodepo/odis-arch
Example from Bapon.
GLOSS (Global Sea Level Observing System) data is in JSON-LD.
Sea surface anomaly data sets are produced by various agencies on the basis of models. How are these linked into ODIS?
What is the information content that ODIS requests to enable the functionality it provides?
Suggests looking into the internals of the ODIS requirements.
Metadata elements provided by protected seas.
Tsunami case study description (from Bapon)
Objective:
Deploy tsunami data exchange specification in ODIS and link in sources for application.
ODIS-Arch patterns for each of the steps identified below
Hazard domain data: In this case tsunami data, tide gauge, etc.
Hazard warning data: Created from a model using the historical tsunami data.
Topography data, bathymetry data, Asset data and historical loss data: With the hazard warning data, this feeds into a further model to provide loss predictions.
Loss predictions in different damage scenarios, generally published in written
Model outputs - amplitude (PROV injection to say where the raw data came from, information already in the standard model output format)
Data sources:
Most already use agreed formats for rapid exchange; interop may be part of the existing digital culture.
Hazard domain Data
USGS Global seismic data (trusted, well used). API which serves JSON-LD, semantic markup via http://schema.org
Oceania seismic data (not publicly open, access through dedicated software)
GLOSS (Global Sea Level Observing System) data, tide patterns
Wave propagation - Regional sources, e.g., Fiji comes NZ metocean Bureau of Meteorology (not freely accessible - need to have a licence or an agreement)
Hazard warning data (model result data)
Created from the hazard domain data, using a model. The output is often an image, but also amplitude, location, speed.
The data described above gives an idea of the amplitude of the tsunami. Put this data into a model, the geolocation, amplitude and speed of travel / travel time.
Must trace the IDs of raw data files that went into models
Asset / loss domain data
Comprises datasets with assets, and historical datasets of loss. Observation / survey data.
Road data
{Fiji Road sourced from Fiji gov.in 2006
more detail can be found here:
T:\Geographics\TTGeographics\FIJIGIS DATA}
Boundary data {example metadata:
Fiji administrative boundary and population downloaded from
data credit:
Census data: Fiji Bureau of Statistics, 2007 Census of Population and Housing
Administrative boundaries: Fiji Bureau of Statistics, 2007 Census of Population and Housing
downloaded date:
05/07/2017}
Loss estimate
Output of a model which takes into account the hazard warning data (magnitude, speed, location) and the asset / loss data. Assessment made of whether this is a 100 year / 50 year event. Allows to make an estimate of the likely loss based on assets data and historical loss data.
Comes from data and model results from 1-3. Output of the analysis of the hazard warning data and the asset/loss data. Currently published as a document, but aspiration to publish as data.
Work to implement this in ODIS-Arch begins here: New pattern: hazards and disasters · Issue #110 · iodepo/odis-arch
Functional areas (from Arofan):
Findability
Identifiers (DOI, ORCID, ROR, etc.)
1: Experts know where to look, URLs available for the provider (their search interface) or distribution services, but not for the data records themselves. The files themselves do have unique identifiers, usually unique within the provider’s scheme, not usually globally unique.
2: Experts know where to look, URLs available for pages that have results
3: Much of the data is not findable as it isn’t digitised.
4: DOI. Published as a document with a DOI. (aspiration to have website)
Search and Discovery
1: Experts know where to look for the portal/discovery interface - providers have dedicated search interfaces or metadata catalogues
2: Experts know where to look - providers have dedicated search interfaces
3: Poor or no online search and discovery
4: None or internal / intranet of institutes only (sensitive data, can affect land values, insurance premiums (insurance companies own much of this data))
Access
1: APIs, FTP, HTTP(S)
2: FTP or API
3: HTTP(S)
4: No public domain, thus no public access protocol, intranet HTTP or similar
Hazard domain data
USGS - API serving JSON-LD, semantic markup via http://schema.org
Oceania - access through dedicated software, satellite transfer to centralised hub. The country who owns the station federated through this session often stores the data relevant to it. Check with SPC/SPREP what they can provide.
GLOSS - API serving JSON-LD, http://schema.org
NZ met ocean - FTP with NetCDF, need licence to access an FTP - how do we find a specific file? How findable is it? Use case need data for a specific date, and ocean area (grid). Metadata is in the NetCDF file. This is a high potential case for exposure of metadata through ODIS to advertise that NZ has data on this area, without sharing it (licence needed)
Aus Bureau of Met - API serves JSON-LD. Otherwise similar to NZ.
Hazard warning data / information
Models are run by a specific agency and exported into a pdf or GIS, output is an image. Used for application purpose. Can access the underlying data.
The tsunami warning service obligation is to provide the warning comprised of mode outputs (PTWC for pacific) published through website in a standard format fit for scraping
Assets / Loss domain data
Need to merge this with the assets data sets. What kernel metadata is needed for practitioners to find assets for response? Assets data comprises:
Asset data set (roads, buildings, schools, hospitals)
Fiji geo portal JSON-LD with shapefiles
Loss dataset is a historical dataset describing losses due to past tsunamis (dollar amount risk or differential element or number of deaths…) - these datasets are becoming more standardised through Sendai framework.
Loss prediction data
Loss prediction: presented in documents (sometimes with a DOI), no web server yet, but this is an aspiration. (ODIS has a documents pattern, which could be applied).Data is licensable, but not accessible.
Cataloguing and Registration
See findability for related content. We understand this FA to be more formal - registration as a formal act that gives you a reliable identifier (you are now in a register and thus a catalogue). Answers below deal with registration.
1: Organisation registries, e.g. in USGS that issue IDs and curate the data, and add news files to a metadata catalogue
2: N/A
3: Often no formal registration process, data obtained from various providers. Sometimes no metadata is stored.
4: Some have been published and have DOIs, much is in institutional registries (closed).
DCAT: which DCAT elements are used, which additional elements are required. Either adopt DCAT or map to DCAT.
Assessment of Fitness for Purpose
The high-level purposes:
Understanding the hazard
Understanding current and future loss
Risk assessment drawing from oceanographic, socio-economic data (investment plans, adaptation options, …)
Understanding design standards / requirements to rebuild or plan
Environmental impact assessments
The granular purposes for the four data types:
1: Discover, subset for relevance, quality control, access, integrate, feed into models or other analyses
2: Find relevant models to use, match data available to model (e.g. may only have a few parameters, what’s the best model to use?), access them (docker containers, installations), run them to characterise tsunami, gather and quality control outputs/model results.
3: Find relevant histories and inventories, in the right format, with the right license, if needed - find data to construct inventory/history, find institutions who may have inventory not yet shared
4: Similar to 2. plus some degree of minimal (meta)data digitisation.
Dublin Core and Related Metadata Schema for Discovery/Cataloguing
Hazard domain Data
USGS Global seismic data (trusted, well used). API which serves JSON-LD, semantic markup via http://schema.org
Oceania seismic (not publicly open, access through dedicated software)
GLOSS (Global Sea Level Observing System) data
Wave propagation - regional sources, e.g. Fiji comes NZ metocean Bureau of Meteorology (not freely accessible - need to have a licence or an agreement)
Hazard warning data
Created from the hazard domain data, using a model. The outputs is often an image, but also amplitude, location, speed.
Must trace the IDs of raw data files that went in to models
Issued by a respected service.
Asset / loss domain data
Comprises datasets with assets, and historical datasets of loss. Observation / survey data.
Of highly variable quality depending on the assets data held in a given country, location.
Loss estimate
Issued by a respected service.
Accessibility
Access Control
1: Noted above, some have license control, require an agreement.
2: Often, public information
3: Often not publicly accessible.
4: Rarely public information, also closed private and confidential
Retrieval - see
1: APIs, FTP, HTTP(S)
2: FTP or API
3: HTTP(S)
4: No public access protocol at detailed level - intranet HTTP or similar, Sendai indicator, maybe.
Authority:
1: Public agencies organisations
2: public and private orgs
3: Fully public orgs
4: Mostly private orgs, some cases public
—
Regarding map services:
What is a map service, feature service, and hosted feature layer?
Map service - data hosting service providing map images that can be rendered dynamically or pre-rendered in tiles in client applications.
Feature service - map service providing feature layers (to access spatial data) and tables (to access non-spatial data).
Feature layer - Data layer containing spatial feature data of same geometry type (point, polyline, or polygon)
Hosted feature layer - Reference to a feature layer in a feature service
Interoperability and Reusability
Structural Metadata
Structural metadata not explicit or not shared, rare cases where roles of attributes are encoded in names
While structural metadata is rare or rarely shared, common structures/serialisations include:
1: Mostly CSV (observation data)
2: netCDF + CF, but variable - also includes GRIB, T/CSV
3: .shapefile + .db and/or geoJSON (geospatial data)
4: PDFs, image files, CSV
Semantics
1: Controlled semantics sometimes used, sometimes with identifiers of various forms including dereferenceable URIs. ISO 19115.2003,
3: Code lists or controlled names of road types and other infrastructures on a national basis. Also likely for cause of death or injury.
4: None or rare.
What vocabularies, thesauri, ontologies are available and useful (how) here?
International semantics for asset types
International Standard for Industrial Classification (ISIC) classes for disaggregating by economic sectors
Most sources are not harmonised semantically - manual work needed to map and integrate
Process and Provenance
How is provenance tracked?
Mostly annotations and free text or in netCDF description blocks
“Fully-Described” Observations (clusters of values to provide context)
What is the full specification of (meta)data
License:
1: Some open, Users often must agree before download. License data not often in files.
2: Users often must agree before download
3: Often not present
4: Users must agree before download
Reusability
UN Data
Treats all of the above data types as elements to organise in data cubes (space, time, attribute(s))
Needs:
Standardised geospatial + temporal (meta)data
Needs metadata to understand which data can be merged / compared / fed forward without or with minimal intervention or manual curation
Currently quite hard without expert knowledge, may require standardised fields which are good potential projections in an ODIS-Arch pattern.
Can we find key attributes to push
Clear link to SDG 13.1.1 indicators - deaths, missing persons, and directly affected persons - is the indicator set - add to ODIS-Arch in measured, predicted, flavours
Sendai framework indicators - these should be elevated to ODIS-Arch pattern for generic hazards and disasters
Resource Management
Fiji case study focuses on cyclones. Bridges ocean, atmosphere, terrestrial, development areas, etc - if we figure out tsunami and the generic pattern, we can take this on
In this case ‘seismic data’ is not original waveform data.
It is derived earthquake hypocentre data.
Tsunamigenesis is determined by (i) magnitude and (ii) depth of the earthquake (or submarine landslide).
Note - the earthquake ‘sensor’ is the seismometer-network+processing-system that produces the hypocentre estimate.
Is this a general pattern?
Important insight from Luis on use of data across domains for applications -
It is data products that are used in secondary applications.
I wanted to share some quick thoughts on the CDIF
Why cross-domain data interoperability?
Cross-domain interoperability is all about enabling collaboration across domains. More specifically, it's about enabling "cross-domain analytics" and "cross-domain value creation".
Principle of “Domain ownership”
Data ownership and accountability should lie with the people who are most familiar with it (those who are the "first-class" users or are in control of its point of origin).
Cross-domain interoperability guidelines should then enable domain owners to easily publish analytical views of their own data in ways that communicate “just enough" meaning and provenance to make collaboration with other domains and with end data consumers possible.
Microdata or Macrodata cross-domain integration?
Cross-domain interoperability, at least in the short term, is mostly about the integration of analytic views of domain-specific data
Interoperable analytic views of domain-specific data are the foundation for cross-domain visualizations, reports, and holistic insights into policy or business decisions, etc.
Analytic views of domain-specific data need to be sufficiently aggregated so they can be readily used for a more or less broad range of predictive or diagnostic use cases --> there will be some loss of information but huge gains in usability
In practice, domain-specific data pipelines that transform micro-data into analytic data need to be handled internally within each domain. It's usually more practical to keep them abstracted from external consumers.
Domain teams / experts should be encouraged to serve analytic views of their own data to other domains and to end users, and ideally be provided with "self-service" tools to facilitate that.
Think not of
"cross-domain interoperable datasets",
but rather of
"cross-domain interoperable data products or services".
Identifiers needed for cross-domain data interoperability
Global identifiers are needed / could be useful for a number of domain areas:
SDGS: Available fro UN Library at: metadata.un.org/sdg/
For example: http://metadata.un.org/sdg/C200303?lang=en
See comments under “Challenges”
A metadata strategy
Take established models and ontologies (rigorous, mature), and use these to guide implementation in Schema.org (pragmatic, developer and tool support).
Provenance
Use PROV-O patterns
Observations, forecasts, predictions, assessments
Use SSN/SOSA patterns, implement in Schema.org - https://github.com/schemaorg/schemaorg/issues/2564
W3C/OGC SSN/SOSA or PROV-O | |
<https://example.org/obs/O-BG-78x> a sosa:Observation ; rdfs:label "O-BG-78x" ; prov:wasAssociatedWith <https://orcid.org/0000-0009-3899-3499 > ; sosa:hasFeatureOfInterest <https://www.wikidata.org/wiki/Q4931215 > ; sosa:hasSimpleResult "4.7 m"^^cdt:ucum ; sosa:madeBySensor <https://www.bunnings.com.au/stanley-fatmax-30m-tape-measure_p5667609 > ; sosa:observedProperty <https://www.wikidata.org/wiki/Q973582 > ; sosa:phenomenonTime [ a time:Interval ; time:hasBeginning [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-01T11:15:00Z"^^xsd:dateTimeStamp ; ] ; time:hasEnd [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-01T11:30:00Z"^^xsd:dateTimeStamp ; ] ; ] ; sosa:resultTime "2014-03-01T11:30:00Z"^^xsd:dateTimeStamp ; sosa:usedProcedure [ a sosa:Procedure ; dcterms:description "directly measure circumference then divide by pi" ; ] ; . | { "@context": "https://schema.org ", "@type": "Observation", "name": "O-BG-78x", "actionStatus": "CompletedActionStatus", "object" : { "@type" : "Thing", "name" : "Boab prison tree, Derby", "url" : "https://www.wikidata.org/wiki/Q4931215 " }, "measuredProperty" : { "@type" : "Property", "name" : "Diameter at breast height", "url" : "https://www.wikidata.org/wiki/Q973582 " }, "measurementTechnique" : "directly measure circumference then divide by pi", "tool" : { "@type" : "Thing", "name" : "Stanley Fatmax 30m tape measure", "url" : "https://www.bunnings.com.au/stanley-fatmax-30m-tape-measure_p5667609 " }, "agent" : { "@type": "Person", "name": "Frances Peel", "url" : "https://orcid.org/0000-0009-3899-3499 " }, "endTime" : "2014-03-01T11:30:00Z", "temporalCoverage" : "2014-03-01T11:15:00Z/2014-03-01T11:30:00Z", "result": { "@type": "PropertyValue", "name": "Length", "value": "4.7 m" } }
|
<https://example.org/obs/T-loss-78x> a sosa:Observation ; rdfs:label "T-loss-78x" ; sosa:hasFeatureOfInterest <https:/www.pacifictsunami.org/fiji> ; sosa:observedProperty [ a sosa:ObservableProperty ; rdfs:comment "a loss estimate spatial distribution" ] ; sosa:usedProcedure <https://riskscape.org.nz ; sosa:madeBySensor [ dcterms:description "Windows 7-10, implementation of riskscape" ; ]; prov:used [ dcterms:description “copy details from github issue 110” ] ; prov:used [ rdfs:comment "input dataset 2" ] ; prov:used [ rdfs:comment "input dataset 3" ] ; prov:used [ rdfs:comment "input dataset 4" ] ; prov:wasAssociatedWith <https://orcid.org/0000-0001-6904-6013 > ; sosa:resultTime "2015-06-01T17:30:00Z"^^xsd:dateTimeStamp ; sosa:phenomenonTime [ a time:Interval ; rdfs:comment "the time the loss happened"; time:hasBeginning [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-01T00:00:00Z"^^xsd:dateTimeStamp ; ] ; time:hasEnd [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-15T23:59:00Z"^^xsd:dateTimeStamp ; ] ; ] ; sosa:hasResult [ rdfs:comment "the geographically distributed loss estimate dataset" ] ; .
| { "@context": "https://schema.org ", "@type": "Action", "name": "T-Loss-78x", "object" : { "@type" : "Thing", "name" : "the geographic region affected", "url" : "https://example.org/Q4931215" }, "measuredProperty" : { "@type" : "Property", "name" : "a loss estimate", "url" : "https://example.org/Q973582" }, "measurementTechnique" : "the model or algorithm", "tool" : { "@type" : "Thing", "name" : "the software and platform implementing the model (software)", "url" : "https://example.org/p5667609" }, "agent" : { "@type": "Person", "name": "Bapon Fakruhddin", "url" : "https://orcid.org/0000-0001-6904-6013 " }, "endTime" : "2015-06-01T17:30:00Z", "temporalCoverage" : "2014-03-01T00:00:00Z/2014-03-15T23:59:00Z", "result": { "@type": "PropertyValue", "name": "netCDF", "downloadUrl": "https://example.org/T-Loss-78x" } }
How to record the inputs? isBasedOn, supportingData ? |
<https://example.org/T-loss-78x> a prov:Entity ; dcterms:spatial [ rdfs:comment "the geographic region affected" ; ] ; dcterms:temporal [ a time:Interval ; rdfs:comment "the time the loss happened" ; time:hasBeginning [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-01T00:00:00Z"^^xsd:dateTimeStamp ; ] ; time:hasEnd [ a time:Instant ; time:inXSDDateTimeStamp "2014-03-15T23:59:00Z"^^xsd:dateTimeStamp ; ] ; ] ; prov:generatedAtTime "2015-06-01T17:30:00Z"^^xsd:dateTime ; prov:wasAttributedTo <https://orcid.org/0000-0001-6904-6013 > ; prov:wasAttributedTo [ a prov:SoftwareAgent ; rdfs:comment "the software and platform implementing the model (software)" ; ] ; prov:wasGeneratedBy <https://example.org/T-loss-model-78x> ; foaf:primaryTopic <https://example.org/netCDF/T-loss-78x> ; . <https://example.org/T-loss-model-78x> a prov:Activity ; rdfs:label "T-loss-78x" ; prov:endedAtTime "2015-06-01T17:30:00Z"^^xsd:dateTime ; prov:generated <https://example.org/T-loss-78x> ; prov:used [ a prov:Plan ; dcterms:description "the model or algorithm" ; ] ; prov:used [ rdfs:comment "input dataset 1" ; ] ; prov:used [ rdfs:comment "input dataset 2" ; ] ; prov:used [ rdfs:comment "input dataset 3" ; ] ; prov:used [ rdfs:comment "input dataset 4" ; ] ; prov:wasAssociatedWith <https://orcid.org/0000-0001-6904-6013 > ; prov:wasAssociatedWith [ a prov:SoftwareAgent ; rdfs:comment "the software and platform implementing the model (software)" ; ] ; . |
|
Challenges:
Gaps in Schema.org, esp around service description, for the ODIS configuration
Finding IRIs for key classifiers
Model (simulation) characterization, classification, description (e.g. loss estimate models, disaster prediction models)
Access to loss estimations - scattered, not yet digitised, discovery, …
Need for IRIs that enable linking to major global policy frameworks (see, e.g., https://unsceb.org/common-digital-identifiers-sdgs and https://unsceb.org/unsif-akn4un )
Use of PROV for describing modeling workflows (CSIRO RRAP project)
Actual example from CSIRO RRAP Datastore (prototype) - the output dataset is the orange circle on the RHS.
There is a need for some light-weight vocabulary to tag models or algorithms by type.
For instance, from https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet :
Text analytics
Regression models
Predicters
Anomaly detection models
Clustering models
Multi-class classification models
Two-class classification models
Image classification models
To illustrate one practical (potential) use case for a domain exchange specification for “models”, it may be useful to refer to the UN Global Platform for Official Statistics ( https://unstats.un.org/bigdata/un-global-platform.cshtml )
The UN Committee of Experts on Big Data and Data Science for Official Statistics (UN-CEBD) is building a cloud-service ecosystem called “UN Global Platform” to support international collaboration in the development of Official Statistics using new data sources and innovative methods, and to help countries measure the Sustainable Development Goals (SDGs) to deliver the 2030 Sustainable Development Agenda.
The UN Global Platform’s Methods service allows users to publish, find, and use trusted methods and algorithms that are available through API callable microservices.