Outcomes of the Workshop, Dagstuhl 2016 Week 2

Motivation and Focus - Dagstuhl First Week

The following topics have a main focus in this workshop:

  • Integration of Data Capture into the DDI4 Model
  • Validation of Data Description Model
  • Re-usable Structured Documentation
  • Controlled Vocabularies
  • Funding Opportunities
  • Long-Term Metadata Infrastructure Plan for the Community

Integration of Data Capture into the DDI4 Model

Data capture is understood as an abstract layer for different data sources. Surveys are the important data source for empirical social science. Other data sources could be also under the hood of Data Capture like register data, data collected by devices and Internet-based data. This can be generally understood as produced by a process. The process can be a black box, i.e. the data collection process is unknown, it can be partially known, or it is well understood (as with surveys).

What have these different data sources in common, what is special, and what can be described? The relationship to the process model and to data description should be clarified, reflected in the model and documented. Use cases for the different scenarios should be developed and the application of the model be documented.

The Data Capture Model has strong relationships with other parts of the DDI Model (eg. process pattern, data description, etc.).  These pieces still require full integration, an important activity for the maturity of the DDI Model. 

Outcomes:

  • Documented examples covering non-survey data sources (bio-medical, process data, register data).  These may come from the 1st week at Dagstuhl. Emphasis should be placed on commonalities rather than on  detailed description of a particular case (a template for use cases will be provided).
  • Resolution of integration topics - model in Drupal updated

    • Design, implementation, and retrospective perspectives: is the model describing a questionnaire design? Describing historical collection? Etc. (This may be a new design pattern because process has the same issue.)

    • Integration with the Data Description Model
    • Incorporate question cascade (another potential pattern - similar to variables)
  • Create and document a plan for moving forward


Validation of Data Description Model

The approach with the description of an atomic datum enables the description of any order of data not only unit record data. What are the most common forms? What are the limitations?  In order to determine the capabilities and goodness of the existing model, a number of well documented real world examples will be produced.  These will include:

  • Unit record (i.e. one person per logical record)
    • As CSV
    • As fixed record length
    • Hierarchical relationship like person to household
    • Multiple physical records per case
  • Aggregate data (eg. tables, multi-dimensional data)
  • Event history data
  • "Data lake"

A template and example datasets will be provided to support and focus this work. 

Outcomes:

  • A set of documented examples which may be published for the use of the community.
  • A document describing the validation/gap analysis of the current model.


Re-usable Structured Documentation

The documentation of DDI can be used for different purposes.  However, it is currently structured in a way that makes the re-use of documentation difficult.  At this workshop, the goal will be to identify structures that are better optimized for re-use.  The rationale for the design will be thoroughly documented and detailed examples created. 

Changes to the documentation process could be made using such a structure.   As new classes in the model are finalized it would be required that this structure be completely populated.  A class will thus be accompanied by rich documention as well as the standard property and relationship set. 

The new documentation structure should meet two requirements:  1) to support re-use of documentation for different products based on the model; 2) enabling changes to the modelling process to capture better documentation.

Outcomes:

  • A document describing the new structure and providing design rationale.
  • A detailed set of examples and guidelines for use.
  • A document exploring how changes to the modelling process could result in better documentation.


Controlled Vocabularies (CV)

Primary goals include defining new work processes, prioritizing deliverables, identifying the appropriate formats for CV publication, and documenting requirements for a CV workbench tool (management requirements/functional specification to contribute to plans for CESSDA tool).

Outcomes:

  • Requirements document regarding tool functionalities needed for developing CVs.
  • A document specifying the new canonical and other formats to be used for DDI CVs.
  • A list of prioritized CVs and a documented plan for moving ahead.
  • An agreed plan for integrating work processes into the DDI production activities.


Funding Opportunities

Explore opportunities and create documentation for potential funding proposals of the local, national, and international funders. The idea here is to create building blocks for funding proposals to be used when writing proposal in their national setting.  The creation of these should be coordinated with the long-term metadata infrastructure plan for the community.

Outcome:

  • A publishable library of proposal fragments organized according to their local, national, and international context.
  • Guideline on how to use these fragments in funding proposals.


Long-Term Metadata Infrastructure Plan for the Community

Other scientific fields like astronomy create a long-term research plan. Local projects pick parts of this plan for their work and use this plan for writing funding proposal. This idea could be adapted to the requirements of the DDI community.  Thus work should be coordinated with the creation of the library of funding proposal fragments.

DDI is the basis of building a large-scale distributed infrastructure to the empirical social science. Such a plan can describe this vision. Organizations, projects, funding proposals could refer to this plan. This is in the spirit of re-use of administered metadata and software. It would share the load of the infrastructure on the shoulders of multiple organizations. 

The plan should be in sync with the emerging strategic plan of the DDI Alliance.  Intended audiences include those writing proposals and those at the executive level.

Outcome:

  • A document setting out a community level strategy for DDI and its infrastructure role.