Data harmonisation and cleaning (group 5)

Data harmonisation and cleaning (group 5)

Regularising approaches to data harmonisation, cleaning using a library of processes

Ernie Boyko (Unlicensed)*

erol.orel (Unlicensed)

Larry Hoyle(notetaker)

Dan Gillman

Steven McEachern (Unlicensed)

Guidelines for Presentations on Technology2.docx


  1. Practitioner-level guidelines (Ernie to lead): 
    1. Data Processing Across Domains Using Shared Libraries and Practices - Best Practice Guidelines and Recommendations - Word doc DOCX

      1. Draft recommendations (Dan Gillman, to be integrated) - Word Doc DOCX
      2. Here is the paper with Larry's content, and Steve's  API. It does not include Dan's recommendations. Data processing across domains-challenges and opportunties2019_10_11_10_52.docx

  2. Technical guidelines/implementation examples (Steve to lead):
    1. Implementing the case study example now
      1. Public RMarkdown Document - Rpubs: http://rpubs.com/stevenmce/DagstuhlGroup5_R_Example1_NOW
      2. Underlying processing syntax - Zip file
    2. Implementing the case study with simple interventions - to be drafted
    3. Automating the case study in the future, implementing approaches from other groups - to be drafted
  3. Revision to the conceptual framework (Ernie's early work during the week)

GOOGLE doc: https://docs.google.com/document/d/1IO8-bJQskZic11IFvXeqj1uJYJ77QlWO/edit

In which cells in the matrix are we working?

This topic doesn't fit neatly into any of the sub-categories in the matrix. It does fall into the I (interoperability) and R (Reusability) categories of Fair.

In relation to which stage in processing does this relate?

Data Provider, Producer of
Cross-Domain Data, Provider of
Harmonization Procedures, Provider of
Cross-Domain Data

What level of guidelines are we working on?

Working Documents




Technical Guidance - Part 1 - Rmarkdown - The World as it is now - http://rpubs.com/stevenmce/DagstuhlGroup5_R_Example1_NOW