Workshop Description (2019)

Addressing global scientific challenges that depend on cross-discipline integration remains difficult. The challenge is to make cross-discipline data integration a routine aspect of data-driven science. Shared vocabularies and metadata specifications are vital tools enabling integration and semantic linking of data within and between disciplines. Standards tend to get developed and adopted within disciplines or application domains with little consideration of cross-discipline requirements and technologies. The goal of this workshop is to identify a combination of vocabularies and technical specifications that will apply broadly across domain boundaries.

This workshop builds on the outcomes of a first Dagstuhl meeting in 2018 (description and report), further exploring how metadata standards can best support interdisciplinary research projects. The 2018 event identified commonalities between several relevant standards, charting a direction for future work. The focus in 2019 will be more technical, looking at detailed aspects of identified approaches, and with specific examples provided in the outputs.

Outcomes will include detailed description of the steps required in dealing with data, i.e. discovery, harmonization, preparation for analyses, publication of data for reuse. An outline of this process view will be supported by more detailed documents covering each phase. Documents will describe current practice and the improvements for data providers and researchers possible through the use of agreed metadata standards. Often several standards or approaches are currently used and/or available. The pros and cons of each standard for each of the stages should be discussed in terms of coverage, acceptance, popularity, software support, and development potential. A set of projects from among those covered in the last workshop will provide concrete case studies.

This work will serve as the basis for a draft set of guidelines for cross-domain projects, to be promoted more widely. Both the specific projects and more generic requirements will be addressed.

Each area will cover:

Discovery of data by human and programmatic search.
Description of provenance
Harmonization of domain-specific data and metadata for the purpose of building an integrated data repository for the purpose of a project.
Preparation of the data/metadata for analysis.
Publication of data (for secondary use).

The focus will be on two or three case studies. It will be necessary for each to be represented by two very well-informed individuals: one a senior researcher or decision maker who understands the research questions and theoretical concepts addressed by the case study; the other an expert in the data itself, with a deep knowledge of the data sources, data formats, thesauri, code lists, relevant metadata specifications, and software.

In addition to the case study representatives, the workshop will be attended by experts in various metadata standards who are open-minded, and able to see how the different standards could work together to achieve an optimal result. The workshop will also include people who are able to quickly produce examples of data and workflow descriptions in the formal notation of the chosen metadata standards and who understand how these examples would be used in software applications and be able to describe their relevance. Linked Data, JSON, XML and UML may all play a role in the examples.

This workshop will be conducted through a combination of plenary meetings and breakout groups to maximise the use of participants’ time and the outputs produced. As well as the guidelines for cross-domain projects, and technical support documents, it is expected that the workshop will result in one or more academic articles on the topics addressed.