Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
iconfalse
titlePurpose

Background

Standards are a vital tool enabling integration and semantic linking of data within and between disciplines. However, standards tend to get developed and adopted within disciplines or application domains with little consideration of cross-discipline requirements and technologies, so data integration can often only be easily achieved within and between closely allied fields. Addressing global scientific challenges that depend on cross-discipline integration remains difficult. The challenge is to make cross-discipline data integration a routine aspect of data-driven science.

Metadata support data discovery, selection, access and use, and are critical for data integration. Data from different sources/domains should be described in a way that cross-discipline discovery can detect and access the relevant data collections, and so that transformations and analyses can be automated. The use of cross-discipline data should become efficient, scalable and reproducible, enabling discipline-neutral data processing and analysis tools to be applied. Furthermore it would be possible to apply (meta-)data mining approaches and reasoning. In sum, new opportunities of insights and realization will develop.

A CODATA initiative on interdisciplinary data integration[1] is seeking to explore these challenges and opportunities in relation to three specific case studies in interdisciplinary research into infectious disease outbreaks, disaster risk and resilient cities.  These case studies provide a concrete focus for exploring the potential of interoperability and data integration through metadata alignment.

Focus of the

Workshop

The workshop will build on a platform provided particularly by the following activities: (i) two previous workshops on DDI and interoperability with other specifications[2], (ii) work to extend and refine DCAT by the W3C Dataset Exchange Working Group (DXWG)[3], and (iii) the three detailed case studies and pilots from the CODATA initiative mentioned above. Metadata activities in the Research Data Alliance provide additional background and context.

There are several different areas where metadata comes into play:

  • Description of studies or data collections for discovery purposes.
  • Descriptions of provenance and scientific context for purposes.
  • Description of data variables or dimensions for analysis purposes.
  • Description

    of data transformation steps for recording purposes (possibly also for reusing the transformation steps on similar data).
  • Controlled vocabularies to ensure standardized and agreed concepts (in relation to variables, collections, measurements, techniques and procedures etc.).
  • The capability to express discoverable and structured metadata must be automatic and achieved as far as possible using tools that are familiar and in common use.

    Topics for Discussion and Possible Outcomes

    Areas of exploration and discussion will identify and describe following:

    • Common rules for metadata specifications
    • Advantages and limitations of generic approaches
    • Techniques for profiling or specializing generic standards for specific applications
    • Best practices for setting up domain-specific data/metadata for cross-domain use
    • Controlled Vocabularies, domain-independent and useful domain-specific ones
    • Contact points/overlaps of specifications, crosswalks and transformations
    • Identification of gaps. Possible workarounds, possible areas for future specifications

    The output of the workshop will likely be reports and working documents on one or more of these topics.

    Metadata Specifications

    The core objective of the workshop will be to investigate and advance alignment between the cross-disciplinary and domain-specific metadata standards, and to bridge from standards focusing on collection-level to variable-level metadata.

    Metadata standards that may be considered include[4]:

    • Study- or collection-level: DCAT, Dublin Core, ISO 19115-1, DDI 4
    • Variable and dimension level
      • Microdata: DDI 4, W3C SSN, FHIR-HL7, CDISC, EML, SensorML, Frictionless data, GSIM
      • Aggregate data: W3C DataCube, ISO 19123
    • Provenance: W3C PROV-O, ISO 19115-2
    • Workflows/data transformation: DDI 4

    Data transformations to prepare data for analysis may be described in machine-actionable form. DDI 4 uses some patterns of BPMN to achieve this, and CSV on the Web addresses transformation of tabular data into semantic form.

    Additional relevant standards are likely to be uncovered during the development of the CODATA initiative.

    [1] http://dataintegration.codata.org/ 

    [2] https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/39911463/Dagstuhl+Sprint+October+2016+Week+Two, https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/7864406/Dagstuhl+Sprint+October+2015

    [3] https://www.w3.org/2017/dxwg/wiki/Main_Page

    [4] A list of links to these specifications and standards is given at the end of the document



    Panel
    titleTopic Pages




    Panel
    titleTopics Overview




    Panel
    borderColorred
    borderWidth2
    borderStylesolid
    titleDraft Agenda

    Link to Dagstuhl Daily Schedule (meal times, etc.)


    Attachments



    Panel
    borderColorblue
    borderWidth3
    titleBGColorlightgray
    borderStylesolid
    titleMaterials for Use in Workshop




    Panel
    borderColorred
    borderWidth3
    borderStylesolid
    titleLocal Information

    Dates: October 1-5, 2018

    The workshop takes place at Schloss Dagstuhl. It has the Dagstuhl event number 18403 and a related web page.

    See the separate page for practical information.


    Panel
    borderColorgreen
    borderWidth3
    borderStylesolid
    titleAttendees


    First Name

    Last Name

    Organization




    ...