2021 Interoperability for Cross-Domain Research: Use Cases for Metadata Standards
Introduction
This workshop builds on the outcomes of two previous Dagstuhl Workshops in 2018 and 2019 on the alignment of standards and technologies for cross-domain data combination. The first two workshops in this series have produced draft guidelines and use case documentation to provide insight into the cross-domain challenges which form the focus of the ISC CODATA Decadal Programme on ‘Making Data Work for Cross-Domain Grand Challenges’.
Due to the Covid-19 pandemic, the workshop will be conducted both in person and virtually.
Scope and Background
To face many of today’s global grand challenges, data is needed from different domains and disciplines, and from different institutional levels, and it must be interoperable to be useful. Research projects in such fields, whether for policy or scientific purposes, often involve the use of data from a wide variety of sources, ranging from specific, local data sets to those supplied by higher-level national and international organizations. A huge proportion of research effort is expended to integrate and harmonize this data so that a meaningful analysis can be conducted.
Global grand challenges require data coming from a wide range of domains and institutional levels, presenting us with diverse issues. This workshop focusses on a set of use cases which exemplify the kinds of issues and solutions which will be required to enable sharing of FAIR data within and across domains. Such capability is necessary if we are to rise to meet these challenges.
Topics and Activities
Below are described possible deliverables and activities within each of the topics for the workshop. It is expected that these will evolve during the course of the work, but this should provide initial direction and a starting point.
This set of use cases is intended to provide the basis for guidance to research communities looking to support FAIR data sharing within and across domains. As such the use cases will be further distilled to provide valuable input for the CODATA Decadal Programme and other global initiatives.
Core Interoperability Toolkit/Framework
This use case is conceived as a cross-cutting activity for the workshop and looks at a set of cross-domain standards and models which would support the FAIR sharing of data across domains. This work is already started but has yet to progress beyond initial discussions. The idea in this workshop is to draft a more concrete proposal about how the needed range of functions could be supported, and which models/standards could be used. This activity will engage with all the other use cases as possible, in order to support identified requirements.
Deliverables:
Documented requirements analysis for generic data-sharing across domain boundaries, and identification of target models/standards for providing support for these functions
Recommendation for external organizations working on relevant specifications (DDI Alliance, GO FAIR, etc.)
Activities:
Review of existing ideas and standards/models (FDOF, DDI-CDI, DCAT, Schema.org, I-ADOPT, PROV-O, etc.)
Analysis of requirements and needed functional coverage, including interactions with domain standards and the other use cases in the workshop
Identification of candidate models/standards in each identified area
Documentation of deliverables.
Helmholtz Metadata Collaboration Use Case
The Helmholtz Metadata Collaboration (HMC) promotes the qualitative enrichment of research data by means of metadata – and implements this approach across the whole organization. This case study focuses on oceanographers and earth scientists in Helmholtz centres looking to enable FAIR data sharing within their community. They have already identified an initial approach, based on Schema.org and implemented using JSON-LD. Now they are looking at a more granular description of data to support integration and reuse, using DDI-CDI. This work has started, but is still exploratory.
Deliverables:
Draft community implementation guide, showing what metadata elements will be used from the DDI-CDI model, how these will be implemented syntactically, and how these will relate to other standards and models (e.g., Schema.org, ontologies, etc.)
Documented examples of data within the domain
Activities:
Presentation of work to date
Identification of information requirements for outputs/exemplary data sets
Identification of needed functionality around the model
Documentation of examples and community guidance
European Social Survey Use Case
This use case looks at the ESS Multi-Level application, and how DDI-CDI and other elements of the core interoperability framework could be used in combination with existing metadata standards and applications to improve the integration of ESS data with data on health, environment, and with other context variables from supra-national sources. The focus will be on leveraging the datum-oriented approach within DDI-CDI, and the process components, to understand how automation could be increased, and other efficiency gains realized.
Deliverables:
Documented analysis of target areas for the application of DDI-CDI and functional requirements/needed metadata
Documented analysis of metadata flows, including those from existing metadata to a more granular framework to support the target application
Description of the use case from a business perspective: costs and benefits
Activities:
Presentation of current practice including review of the EOSC description, and discussion/exploration of the issues at a greater level of detail
Identification of target functionality and assessment of efficiency gains to be realized
Identification of metadata requirements and needed transformation from existing metadata holdings
Brainstorming the implementation strategy both at the level of institution process and technology application
Documentation of these discussions into deliverables
Smart Energy Research Laboratory Use Case
The SERL example shows how manual integration across domain boundaries, at the data-set level, can be improved. This use case will look at how the granular management and dissemination of data, and the automation of some integration functions might be realized. It will also reflect on the requirements for data services to support variable subsetting to meet evolving research needs, especially in the area of data reuse across domain boundaries.
Deliverables:
Documentation of the SERL case and desired improvements based on granular (meta)data management
Analysis and recommendations on how existing data-set-oriented systems could move to a more granular framework, including variable-level sub-setting an similar approaches
Activities:
Presentation of the existing SERL and review of available metadata and data
Review of the analysis of data-set-level dissemination and management, and issues which have been identified
Brainstorm possible solutions to the problems presented (e.g., variable-level dissemination, automation, etc.)
Document approaches to solving these problems, in both the near- and longer term.
InterStat/NGSI-LD Use Case
The InterStat project is looking at incorporating official statistical data and other types of research data into general public information produced by European governmental bodies. The focus is on bridging across the set of domain specific models and standards to support the information channels being employed by the EC. This spans SDMX, DataCube, the NGSI-LD domains, DDI-CDI, SOSA-SSN, and potentially other standards.
Deliverables:
Documentation of requirements and vision for dissemination of statistical and research data within the European system as part of a coherent flow of open public information
Technical documentation of mappings and use of metamodels and models to support these requirements
Activities:
Presentation of work to date and review of the relevant standards and models
Consideration of possible use cases for combining data from different sources and combining data with other types of information
An analysis of the needed coverage among the metamodels and domain models involved and describe the needed mappings between these standards and models
Document intended support and use of these models in InterStat scenarios.
ENVRI-FAIR Use Case (Proposed)
This is a new proposed use case, which will be discussed and further described in a meeting before the workshop. The idea is to explore how the available standards and models – including the core interoperability framework, as well as relevant domain standards – could be combined to address challenges in data sharing within and outside of the ENVRI-FAIR community.
Deliverables:
Document requirements and vision around FAIR data sharing, specifying what would be possible and what the longer-term goals would be for the community
Produce a FIP or other statement of what target standards and technologies will be employed
Activities:
Review the elements of the core interoperability framework
Examine the state of play within the domain regarding metadata and data management, including standards and technology approaches
Identify goals and requirements for FAIR sharing of data, including timelines
Draft FIPs or similar documentation
Document goals and requirements and technology approaches/standards
EOSC Life Use Case (Proposed)
This is also a new proposed use case, which will be discussed and further described in a meeting before the workshop. The idea is to explore how the available standards and models – including the core interoperability framework, as well as relevant domain standards – could be combined to address challenges in data sharing within and outside of the EOSC Life community.
Deliverables:
Document requirements and vision around FAIR data sharing, specifying what would be possible and what the longer-term goals would be for the community
Produce a FIP or other statement of what target standards and technologies will be employed
Activities:
Review the elements of the core interoperability framework
Examine the range of data and potential forms of reuse across the fields of genomics, molecular biology, health sciences etc.
Identify relevant standards and models for interchange of data within this set of domains
Draft FIPs or similar documentation
Document goals and requirements and technology approaches/standards
Workshop summary
Date and Location
The workshop takes place at Schloss Dagstuhl – Leibniz Center for Informatics on September 27 to October 1, 2021. It has the Dagstuhl event number 21393 and a related web page.
See the separate pages with practical information and information about COVID-19.
Use Cases
Core Interoperability Toolkit/Framework
Helmholtz Metadata Collaboration (HMC) Use Case
European Social Survey (ESS) Use Case
Smart Energy Research Lab (SERL) Use Case
InterStat/NGSI-LD Use Case
ENVRI-FAIR Use Case (Proposed)
EOSC Life Use Case (Proposed)
Related Material
CODATA’s Decadal Programme ‘Making Data Work for Cross-Domain Grand Challenges’
Page with links to the EOSC report and DDI-CDI documentation/specification
DDI-CDI Webinars (including slides and recordings)
Organizers and Participants
Organizers
Simon Cox, CSIRO Australia and W3C Dataset Exchange Working Group
Arofan Gregory, Consultant and DDI Alliance
Simon Hodson, CODATA - Committee on Data of the International Science Council (ISC)
Steven McEachern, Australian National University and DDI Alliance
Hilde Orten, Norwegian Center for Research (NSD) and DDI Alliance
Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences and DDI Alliance
Participants list (tba)