2021 Interoperability for Cross-Domain Research: Use Cases for Metadata Standards


Introduction

This workshop builds on the outcomes of two previous Dagstuhl Workshops in 2018 and 2019 on the alignment of standards and technologies for cross-domain data combination. The first two workshops in this series have produced draft guidelines and use case documentation to provide insight into the cross-domain challenges which form the focus of the ISC CODATA Decadal Programme on ‘Making Data Work for Cross-Domain Grand Challenges’.

Due to the Covid-19 pandemic, the workshop will be conducted both in person and virtually.

Scope and Background

To face many of today’s global grand challenges, data is needed from different domains and disciplines, and from different institutional levels, and it must be interoperable to be useful. Research projects in such fields, whether for policy or scientific purposes, often involve the use of data from a wide variety of sources, ranging from specific, local data sets to those supplied by higher-level national and international organizations. A huge proportion of research effort is expended to integrate and harmonize this data so that a meaningful analysis can be conducted.

Global grand challenges require data coming from a wide range of domains and institutional levels, presenting us with diverse issues. This workshop focusses on a set of use cases which exemplify the kinds of issues and solutions which will be required to enable sharing of FAIR data within and across domains. Such capability is necessary if we are to rise to meet these challenges.

Topics and Activities

Below are described possible deliverables and activities within each of the topics for the workshop. It is expected that these will evolve during the course of the work, but this should provide initial direction and a starting point.

This set of use cases is intended to provide the basis for guidance to research communities looking to support FAIR data sharing within and across domains. As such the use cases will be further distilled to provide valuable input for the CODATA Decadal Programme and other global initiatives.

Core Interoperability Toolkit/Framework

This use case is conceived as a cross-cutting activity for the workshop and looks at a set of cross-domain standards and models which would support the FAIR sharing of data across domains. This work is already started but has yet to progress beyond initial discussions. The idea in this workshop is to draft a more concrete proposal about how the needed range of functions could be supported, and which models/standards could be used. This activity will engage with all the other use cases as possible, in order to support identified requirements.

Deliverables:

  • Documented requirements analysis for generic data-sharing across domain boundaries, and identification of target models/standards for providing support for these functions

  • Recommendation for external organizations working on relevant specifications (DDI Alliance, GO FAIR, etc.)

Activities:

  • Review of existing ideas and standards/models (FDOF, DDI-CDI, DCAT, Schema.org, I-ADOPT, PROV-O, etc.)

  • Analysis of requirements and needed functional coverage, including interactions with domain standards and the other use cases in the workshop

  • Identification of candidate models/standards in each identified area

  • Documentation of deliverables.

Helmholtz Metadata Collaboration Use Case

The Helmholtz Metadata Collaboration (HMC) promotes the qualitative enrichment of research data by means of metadata – and implements this approach across the whole organization. This case study focuses on oceanographers and earth scientists in Helmholtz centres looking to enable FAIR data sharing within their community. They have already identified an initial approach, based on Schema.org and implemented using JSON-LD. Now they are looking at a more granular description of data to support integration and reuse, using DDI-CDI. This work has started, but is still exploratory.

Deliverables:

  • Draft community implementation guide, showing what metadata elements will be used from the DDI-CDI model, how these will be implemented syntactically, and how these will relate to other standards and models (e.g., Schema.org, ontologies, etc.)

  • Documented examples of data within the domain

Activities:

  • Presentation of work to date

  • Identification of information requirements for outputs/exemplary data sets

  • Identification of needed functionality around the model

  • Documentation of examples and community guidance

European Social Survey Use Case

This use case looks at the ESS Multi-Level application, and how DDI-CDI and other elements of the core interoperability framework could be used in combination with existing metadata standards and applications to improve the integration of ESS data with data on health, environment, and with other context variables from supra-national sources. The focus will be on leveraging the datum-oriented approach within DDI-CDI, and the process components, to understand how automation could be increased, and other efficiency gains realized.

Deliverables:

  • Documented analysis of target areas for the application of DDI-CDI and functional requirements/needed metadata

  • Documented analysis of metadata flows, including those from existing metadata to a more granular framework to support the target application

  • Description of the use case from a business perspective: costs and benefits

Activities:

  • Presentation of current practice including review of the EOSC description, and discussion/exploration of the issues at a greater level of detail

  • Identification of target functionality and assessment of efficiency gains to be realized

  • Identification of metadata requirements and needed transformation from existing metadata holdings

  • Brainstorming the implementation strategy both at the level of institution process and technology application

  • Documentation of these discussions into deliverables

Smart Energy Research Laboratory Use Case

The SERL example shows how manual integration across domain boundaries, at the data-set level, can be improved. This use case will look at how the granular management and dissemination of data, and the automation of some integration functions might be realized. It will also reflect on the requirements for data services to support variable subsetting to meet evolving research needs, especially in the area of data reuse across domain boundaries.

Deliverables:

  • Documentation of the SERL case and desired improvements based on granular (meta)data management

  • Analysis and recommendations on how existing data-set-oriented systems could move to a more granular framework, including variable-level sub-setting an similar approaches

Activities:

  • Presentation of the existing SERL and review of available metadata and data

  • Review of the analysis of data-set-level dissemination and management, and issues which have been identified

  • Brainstorm possible solutions to the problems presented (e.g., variable-level dissemination, automation, etc.)

  • Document approaches to solving these problems, in both the near- and longer term.

InterStat/NGSI-LD Use Case

The InterStat project is looking at incorporating official statistical data and other types of research data into general public information produced by European governmental bodies. The focus is on bridging across the set of domain specific models and standards to support the information channels being employed by the EC. This spans SDMX, DataCube, the NGSI-LD domains, DDI-CDI, SOSA-SSN, and potentially other standards.

Deliverables:

  • Documentation of requirements and vision for dissemination of statistical and research data within the European system as part of a coherent flow of open public information

  • Technical documentation of mappings and use of metamodels and models to support these requirements

Activities:

  • Presentation of work to date and review of the relevant standards and models

  • Consideration of possible use cases for combining data from different sources and combining data with other types of information

  • An analysis of the needed coverage among the metamodels and domain models involved and describe the needed mappings between these standards and models

  • Document intended support and use of these models in InterStat scenarios.

ENVRI-FAIR Use Case (Proposed)

This is a new proposed use case, which will be discussed and further described in a meeting before the workshop. The idea is to explore how the available standards and models – including the core interoperability framework, as well as relevant domain standards – could be combined to address challenges in data sharing within and outside of the ENVRI-FAIR community.

Deliverables:

  • Document requirements and vision around FAIR data sharing, specifying what would be possible and what the longer-term goals would be for the community

  • Produce a FIP or other statement of what target standards and technologies will be employed

Activities:

  • Review the elements of the core interoperability framework

  • Examine the state of play within the domain regarding metadata and data management, including standards and technology approaches

  • Identify goals and requirements for FAIR sharing of data, including timelines

  • Draft FIPs or similar documentation

  • Document goals and requirements and technology approaches/standards

EOSC Life Use Case (Proposed)

This is also a new proposed use case, which will be discussed and further described in a meeting before the workshop. The idea is to explore how the available standards and models – including the core interoperability framework, as well as relevant domain standards – could be combined to address challenges in data sharing within and outside of the EOSC Life community.

Deliverables:

  • Document requirements and vision around FAIR data sharing, specifying what would be possible and what the longer-term goals would be for the community

  • Produce a FIP or other statement of what target standards and technologies will be employed

Activities:

  • Review the elements of the core interoperability framework

  • Examine the range of data and potential forms of reuse across the fields of genomics, molecular biology, health sciences etc.

  • Identify relevant standards and models for interchange of data within this set of domains

  • Draft FIPs or similar documentation

  • Document goals and requirements and technology approaches/standards

 


Date and Location

The workshop takes place at Schloss Dagstuhl – Leibniz Center for Informatics on September 27 to October 1, 2021. It has the Dagstuhl event number 21393 and a related web page.

See the separate pages with practical information and information about COVID-19.


Use Cases

  • Core Interoperability Toolkit/Framework

  • Helmholtz Metadata Collaboration (HMC) Use Case

  • European Social Survey (ESS) Use Case

  • Smart Energy Research Lab (SERL) Use Case

  • InterStat/NGSI-LD Use Case

  • ENVRI-FAIR Use Case (Proposed)

  • EOSC Life Use Case (Proposed)


DDI-CDI Webinars (including slides and recordings)


Organizers and Participants

Organizers

  • Simon Cox, CSIRO Australia and W3C Dataset Exchange Working Group

  • Arofan Gregory, Consultant and DDI Alliance

  • Simon Hodson, CODATA - Committee on Data of the International Science Council (ISC)

  • Steven McEachern, Australian National University and DDI Alliance

  • Hilde Orten, Norwegian Center for Research (NSD) and DDI Alliance

  • Joachim Wackerow, GESIS - Leibniz Institute for the Social Sciences and DDI Alliance

Participants list (tba)