Background on the DDI Model

This is split into several sections.

The overall description of the model is at https://ddi4.readthedocs.org/en/latest/Introduction/modeldesc.html

A summary of the production process is at https://ddi4.readthedocs.org/en/latest/Introduction/modelproduction.html
The design principles are at https://ddi4.readthedocs.org/en/latest/Introduction/designprinciples.html
A summary of the main building blocks (albeit incomplete) is at https://ddi4.readthedocs.org/en/latest/Introduction/buildingblocks.html
The model definition is at https://ddi4.readthedocs.org/en/latest/Package/index.html, this is rebuilt every night from the development platform.
The model development platform is at http://lion.ddialliance.org/
If you want to look at the model, it is primarily arranged by packages, http://lion.ddialliance.org/packages. At the moment the only way to visualize the whole package is here.
An example is Data Capture at http://lion.ddialliance.org/package/datacapture, (the visualization of the whole package is at the bottom of this page).
At the moment Functional Views do not have a graphical representation, but the content can be viewed at http://lion.ddialliance.org/views.

There will be presentations on the first day covering overarching ideas and modelling concepts, such as the use of process model, containers etc.

External Reviewers: Areas of Expertise and Other Standards

David Barraclough

Expertise: SDMX

Position: Chair, SDMX Statistical Working group, OECD

Coordinate SDMX tools implementation and data exchange for the OECD, and provide SDMX internal (and starting on external) training. Chair the SDMX Statistical Working Group, organising new content-oriented guidelines and working with TWG to drive new technical developments. I attend the SDMX Secretariat and sponsor calls. Motivation for this workshop – DDI isn’t used much at the OECD or IOs for that matter. So, see how DDI can help the OECD, and integrate with SDMX to provide a complete package for metadata archiving, descriptions, and exchange modelling. Following this, perhaps make the “new DDI” easier to adopt for IOs, or simply make it more visible by finding more cases for IOs such as OECD, perhaps by having the DDI equivalent of SDMX’s “content-oriented guidelines”.

Gary Berg-Cross

Expertise: Semantic Web, Spatial Ontology, Semantics of SOA

Position: Research Data Alliance, Chair of Spatial Ontology Community of Practice

Research Data Alliance Data Foundations and Terminology and related work to identify a cross community harmonized vocabulary of key terms of the field of research data (science/management) as well as supporting metadata and semantics.

Briefing on standardized vocabulary development process. https://rd-alliance.org/filedepot?fid=311
Scoping work on data management concepts https://rdalliance.org/system/files/documents/Draft%20DFT%20Larger%20Scope_0.pptx
Metadata Semantics https://rd-alliance.org/system/files/documents/4a_RDA_and_Semantics_Berg-Cross_0.pptx
RDA Metadata and Semantics Workshop Summary Report https://rd-alliance.org/system/files/documents/Metadata_and_Semantics_Workshop_Summary_Report.pdf
Improving Semantics for CUAHSI Controlled Vocabularies https://www.cuahsi.org/PageFiles/GaryBerg-Cross.pptx
Orientations to Methods for Building Ontology Design Patterns https://drive.google.com/file/d/0B2aXBeV4GpXXRHQ0OThIUlR0R00/view

Michel Dumontier

Expertise: Linked Data, Life Sciences

Position: Associate Professor, Stanford

W3C HCLS Community Profile on Dataset Descriptions Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. We forged consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval. http://www.w3.org/TR/hcls-dataset/

CEDAR - Center for expanded data annotation and retrieval Providing high quality metadata is key to reuse of scientific data, but authoring good metadata is tedious and prone to error. The goal of the Center for expanded data annotation and retrieval (CEDAR) is to create a unified framework that researchers in all scientific disciplines can use to create consistent, easily searchable metadata. http://med.stanford.edu/cedar.html

FAIR - One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows. We developed a minimal set of guiding principles, termed FAIR, to make data Findable, Accessible, Interoperable, and Re-usable. http://datafairport.org/

Martin Forsburg

Expertise: UN / CEFACT - Supply Chain and e-Procurement

Position: ECRU Consulting

OASIS Universal Business Language – (Both information modelling and methodology (XML NDR and customization techniques) https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl#overview

UN/CEFACT working group for Supply Chain and the methodology group responsible for XML-development http://www1.unece.org/cefact/platform/download/attachments/9666570/ISCM-Dublin_Lunch_and_Learn.ppt?version=1

CEN PC/434 – Committee responsible for development of an European Norm for e-invoicing https://www.cen.eu/work/areas/ICT/eBusiness/Pages/default.aspx

CEN WS/BII – Project responsible for development of standard processes and message design for e-procurement http://www.cenbii.eu/ I’ve also worked in implementation projects with the standards from World Customs Organization (WCO), financial/payment projects using ISO20022 and XBRL.

Daniella Meeker

Expertise: HL7, Computation and Neural Systems and Health Economics.

Position: Director, Clinical Research Informatics, University of Southern California

One focus of my recent work is achieving interoperability between healthcare delivery and research in federated settings to optimize discovery and dissemination. This involves working along several dimensions of metadata – governance and stewardship, provenance, clinical workflow, semantics, and analysis and data preparation procedures.

We are frequently faced with multisource streaming transactional data that has not been collected or modelled purposefully and lacks tight linkage to a traditional information model. I’m currently interested in data science behind applications of pattern recognition algorithms to discover missing metadata across heterogeneously collected data sets to enable harmonization, reduce bias, and improve causal inference. This includes better automation in knowledge management and change data capture – understanding how semantics change over time in reaction to changes to workflow, external policies, instrument calibration, or practices.

Alejandra Gonzalez-Beltran

Expertise: Data Management, Data Standards, Semantic Web/Ontologies, Biomedical Informatics

Position: Research Lecturer, Oxford e-Research Centre, University of Oxford

The ISA framework (http://www.isa-tools.org) supports management and curation of experimental data. It includes both a general-purpose file format and a software suite to tackle the harmonization of the structure of bioscience experimental metadata (e.g., provenance of study materials, technology and measurement types, sample-to-data relationships) in an increasingly diverse set of life science domains (including metabolomics, (meta)genomics, proteomics, system biology, environmental health, environmental genomics and stem cell discovery) and also enables compliance with the community standards.The community is grouped in the ISA commons (http://www.isacommons.org)

BioSharing (http://www.biosharing.org) is a curated, web-based, searchable portal of three linked registries of content standards, databases and data policies in the life sciences, broadly encompassing the biological, natural and biomedical sciences. Our standard and database records are informative and discoverable, maximizing standards adoption and (re)use (e.g. in data policies), and allowing the monitoring of their maturity and evolution.

bioCADDIE (https://biocaddie.org/) engages a broad community of stakeholders to create the NIH Big Data two Knowledge (BD2K) Data Discovery Index (DDI). The DDI will do for data what PubMed (and PubMed Central) did for the literature. I am a member of the Working Group 3 (https://biocaddie.org/group/working-group/working-group-3-descriptive-metadata-datasets), which focuses on Descriptive Metadata for Datasets and has developed a first metadata specification (http://dx.doi.org/10.5281/zenodo.28019).

CEDAR (http://metadatacenter.org) is the NIH BD2K Center for Expanded Data Annotation and Retrieval Providing high quality metadata is key to reuse of scientific data, but authoring good metadata is tedious and prone to error. Its goal is to create a unified framework that researchers in all scientific disciplines can use to create consistent, easily searchable metadata. We will work on supporting templates from BioSharing standards.

STATO (http://stato-ontology.org/) is a general-purpose STATistics Ontology. Its aim is to provide coverage for processes such as statistical tests, their conditions of applications, and information needed or resulting from statistical methods, such as probability distributions, variable, spread and variation metrics. STATO also covers aspects of experimental design and description of plots and graphical representations commonly used to provide visual cues of data distribution or layout and to assist review of the results. STATO has been developed to interoperate with other OBO Foundry ontologies (e.g. OBI), hence relies on BFO as a top level ontology and uses OBI as mid-level ontology.

Background Information for the Sprint

Background on the DDI Model

External Reviewers: Areas of Expertise and Other Standards