Data Capture Dagstuhl 2016 Notes

 Day One Notes

Title of session: Data Capture

Day: Monday, October 24

Participants: Barry Radler, Dan Smith, Wolfgang Zenk-Moltgen, Kelly Chatain, Arofan Gregory (visiting)

Chair: Barry Radler

Note taker: Kelly Chatain

Morning Session

Background information

Not much work since Dagstuhl 2015 by the Data Capture team. In general waiting to hear back on the work already completed and in need of review. 

Reviewed the team members, a few were removed due to inactivity and other reasons. Sophia, Ingo, and Brigitte.  Jannik has not had the time to help with the modelling and therefore a new modeller for Data Capture should be assigned (issue created in Jira).


Backing-up and Reviewing the model:

The group went back to the model to review changes made by others outside of the team. 

**All ties into the process model with a DDI Act. But how? Where do we define the questionnaire flow (the sequence), the instrument itself?

10:55 - Arofan brings in Larry's rendering of what's in drupal workflow

Dan: Wants to get some response domains defined. 

Issue: How to describe what the actual response domain is?

Goal - Basic questionnaire, workflow

**If methodology needs more information about an act, then they should add it to their patterns. 

Decisions

Use basic questionnaire to walk through the model. Perhaps using three examples to illustrate 1)prescriptive needs for designing a questionnaire 2) run-time needs for processes, and 3) archival needs for describing after data is received.

Afternoon Session

Participants: Kelly Chatain, Barry Radler, Dan Smith, Wolfgang Zenk-Möltgen

Created two question, one measurement instrument with an introductory statement and interviewer instructions. (see attached)


Decisions

 Will create one example to illustrate prescriptive (design), run-time, and archive activities.


Issues: All issues logged as of Wednesday, Oct. 26th

How to name a question?

Problems with bindings In/Out parameters


DDI Data Capture Use Case #1.

Description: Permission form for blood pressure data capture performed in a clinical setting by a nurse or clinician. 

Introduction: Please complete the following fields to provide permission for collecting a blood pressure reading.


Question1: “What is your full name?”

Description: Question with two separate responses; response options are open-ended text.


  1. First name ___________________
  2. Last name ____________________


Question2: May we perform a blood pressure reading on you?

Description: Question with one response; response option is closed-ended dichotomous category.


  1. Yes
  2. No


Instructions to Interviewer/Clinician: If participant provides permission, attach blood pressure cuff and perform standard 30 second BP measure. Record blood pressure and pulse in fields below.


Measurement1:


  1. Systolic blood pressure: _ _ _
  2. Diastolic blood pressure: _ _ _
  3. Pulse (beats per minute): _ _ _


 First graphs:

How DDI 3.2 represents this data capture use case.


How DDI 4 will capture this use case, using represented and instance questions/measures.


DDI 4 with bindings


 Day Two Notes

Morning Session

Attending: Kelly Chatain, Wolfgang Zenk-Möltgen, Barry Radler, Dan Smith

Discussion of other classes:

  • Webscraping
  • Twitter harvesting
  • Qualitative data
  • Biomedical

Issues: (All issues logged as of Wednesday, Oct.26th)

  1. The ambiguity and/or redundancy of the name, title, or annotation of a question DCAP-4 - Getting issue details... STATUS
  2. The In/Out parameters should have at least names, description, and representation, maybe even default value? hasParameters DCAP-5 - Getting issue details... STATUS
  3. We have to create response domains, because there aren't any. A scale, an integer, etc. We should take the existing DDI 3.2 responseDomains and model them in DDI 4. DCAP-6 - Getting issue details... STATUS  
  4. Workflow - All object relationships except for WorkflowSequence do not apply to this example which only requires defining logical step.  Should be removed to methodological. 
  5. All identified items should have the ability to link to additional material, i.e. a scanned copy of a questionnaire. DCAP-7 - Getting issue details... STATUS  
  6. WorkflowSequence has three ways to define order (SpecificSequence, Collection type, and realizes Collection (within which you can also set an order) none of which work for our purposes. In addition, WorkflowStep does not imply order in RDF (in XML, they do). DCAP-8 - Getting issue details... STATUS
  7. Bindings should not be annotatedIdentified. All you need are the in/out parameters. Does not need extra information in order to achieve these connections.


The Work:

Discussion about the bindings between the output parameters required for making the elements reusable. 

Parameter collection is what you refer to when you create the binding. Parameters defined in WorkflowStep.

Will comment out the issues in the example and link them to actual JIRA Issues, when they are logged.

Afternoon Session

Attending: Kelly Chatain, Wolfgang Zenk-Möltgen, Barry Radler, Dan Smith

Beginning with the description, which goes into Abstract

Question: Where to record language as a property of the fielded questionnaire, but not of the specification design. Perhaps in implementedInstrument as opposed to conceptualInstrument.

Question: Layout - can capture pdf or screenshot, but cannot prescribe the layout automatically. 

Feedback: 

Issues:

  1. Distinguish patterns better from properties in Lion in order to more clearly see what the actual content is and to allow modellers to more easily use patterns and not create relationships to use them. DCAP-9 - Getting issue details... STATUS
  2. Remove instructionalCommand from base Act class. Not all Acts need it. DCAP-10 - Getting issue details... STATUS
  3. Too much information in workflowStep. Not every step requires purpose, usage, and overview - should be removed, should be added in an extended class if needed. DCAP-11 - Getting issue details... STATUS
  4. Base workflowStep does not require hasProcessFramework or isPerformedBy, should be created in a derived class as needed. DCAP-12 - Getting issue details... STATUS
  5. Removing SubstantiveValueDomain and SentinelValueDomain from ResponseDomain because they are inappropriate. DCAP-13 - Getting issue details... STATUS
  6. RepresentedQuestion does not have output parameters - But should they be added to ResponseDomain or RepresentedQuestion? We think ResponseDomain. DCAP-14 - Getting issue details... STATUS
  7. Moving RepresentedVariable from RepresentedQuestion to ResponseDomain DCAP-15 - Getting issue details... STATUS
  8. RespresentedVariable should not have a universe, that should be at the instance level (universes only apply to instantiated variables, not represented variables) DCAP-16 - Getting issue details... STATUS
  9. hasIntendedDataType should be incorporated into the value domain DCAP-17 - Getting issue details... STATUS
  10. SubstantiveValueDomain does not specify different types of ValueDomain with enough specificity, i.e. numeric text, date/time, ranges, etc. Only provides one string. DCAP-18 - Getting issue details... STATUS
  11. The Parameters class does not need to be identified or annotated, it's just a collection of input/output parameters. DCAP-19 - Getting issue details... STATUS
  12. workflowSteps should have bindings!!! DCAP-20 - Getting issue details... STATUS
  13. ElseIf should not be identified or annotated and should not derive from ConditionalControlConstruct DCAP-21 - Getting issue details... STATUS
  14. Elseif cardinality should be 0..n not 1..1. DCAP-22 - Getting issue details... STATUS
  15. ControlConstruct should not contain a WorkflowStep (should all be in WorkflowSequence) Control creates the logical flow, and having a child worklfow step disrupts how you make that - two ways to have nested workflow steps DCAP-23 - Getting issue details... STATUS
  16. IfThenElse - elseContains should be 0..1 - only one else available DCAP-24 - Getting issue details... STATUS
  17. CommandCode definition needs to be updated to remove "...definition of InParameter and OutParameter and binding declared within command code...." DCAP-25 - Getting issue details... STATUS
  18. structuredCommand is very XML centric in its description and the way it is used. DCAP-26 - Getting issue details... STATUS
  19. Command definition does not need "...definition of InParameter and OutParameter and binding declared within command code...." DCAP-27 - Getting issue details... STATUS
  20. Binding - Target Objects - InputParameter and OutputParameter should be 0..1 DCAP-28 - Getting issue details... STATUS
  21. Binding should be updated to use source and target instead of input and output and should allow either InputParameter or OutputParameter in both. DCAP-29 - Getting issue details... STATUS

Changed: In Capture - measurementName switched to Name

Changed: In Capture - changed to 0..n from 0..1 in source cardinality ResponseDomain


 Day Three Notes

Notes:

Request to look at other types of captures at the high level. Are there candidates to go into the element registry? Besides representedQuestion and representedMeasure - are there other items that could could go in? Taina mentioned something about controlled vocabulary for types of captures. 

Wendy wants us to think about describing pulling admin data as capture vs. and analysis (Wolfgang). Should we model so that a computer can act or just describe how it was done? What does it need for automatic for real-time data collection...Fitbit example - has software by device and brand. Collecting the data from the API. How to deploy to Blaise or other? When I see the measurement type I will use this piece of code I"ve written. Can't code for every system or device.

What is needed in reusable measurement vs instantiated. For instance, series number of blood pressure device could be in instanceMeasurement.

Barry - Need link to white paper 4-5 years ago?

representedMeasure and use of controlled vocabulary for the type of measure, the responsibility of the user to create and use those vocabularies.

In 3.2, ProcessingEvent was vague enough to capture instructions for preparing to take a blood pressure measurement, for instance.

Do we pull in a transformation as a Data Capture or push out to the variables. A selection of those tweets to save for future research. Which is external to DDI.

Measure defined as a data processing step, then that helps with the selection/analysis.

New class - administrative data gathering item - i.e. DataSource class?

Big use cases that deserve their own measurement class.

Ingo's use case involving educational testing. 

George Question: Pointing to ontologies with a measurement? Loinc as https://loinc.org/

How do we link the concept? What is the question intent? 


Questions:

Content for instanceVariable? Is there link to the capture for this? Question or measure? or link to a different instanceVariable as a source variable.

Linkages to the representedVariable and conceptualVariable on the Data Description side.

This photo was updated on Day Four.

 Day Four Notes

Morning Session

Attending Joint Discussion:

Data Description: Achim Wackerow, Knut Wenzig, Dan Gillman, Larry Hoyle, Arofan Gregory

Data Capture: Kelly Chatain, Barry Radler, Wolfgang Zenk-Möltgen, Dan Smith


Joint Discussion Notes:

Both DD and DC have instanceVariables, so how do the question and definition match up? RepresentedVariable should be in Data Capture  because it is created by the ResponseDomain as the intended data type for that question. 

Can create new instances of ResponseDomain to specify new set of values for different back end systems (Dan S.)

Do sentinel value codes lead to different physical instance data sets? 

If you have different sentinel values, then you have different RepresentedVariables (Dan S.)

Some systems select the sentinel codes, some can't. 

One conceptual variable and several representations (Dan S.)

One represented variable and several instances (Dan G.) Concept to terms, not terms to concepts. 

Data Description - 0,1,1,0 and 0110 are they two separate instance variables or one? Copy to a new file is exactly the same. 

(Kelly loses track of the conversation)

Still trying to define the instance variable

Start with the file--the byte points to a value mapping, the value mapping is associated with instance variable, that instance variable points to the instance question, instance question to the represented question, which leads to represented variable. 

Dan Smith - Data description needs to somehow need to point to the specific response domain of the represented question

Where the instance question to the response domain

The Dans agree - Instance variable links to instance question links to represented question which has response domains. Represented question links to represented variable which encodes the response domain.

Data Description to file three issues - create relationships between IV and IQ also IV to RD1, RD2, RD3. Dan Smith question: How to create derived instance variables? What relationship do we create on an instance variable to point to source variables for derivations? For Data Description to log as an issue.

THE DECISION?!

IV=instanceVariable

IQ=instanceQuestion

RQ=representedQuestion

RV=representedVariable

RD=responseDomain

Second Morning & Afternoon Session

Kelly Chatain, Barry (half-time), Wolfgang, Dan Smith, and Kerrin!

Looking at Response domains. Pulling from DDI 3.2. recommended data type (question) being replaced by the represented variable type, need to rpresent missing values

Response options to define (all extend ResponseDomain):

TextResponseDomain (New Class) 3.2 TextDomainType

NumericReponseDomain (New Class) 3.2 NumericDomainType

CodeResponseDomain (New Class) 3.2 CodeDomainType

Will include a conditional text domain to include "Other" entries with a code list.


Wolfgang - We need a touch point between data capture and analysis (exists as a GenerationInstruction in DDI 3.2) to capture derived and other transformations. DCAP-32 - Getting issue details... STATUS  

Question: Why are the properties Name and DisplayLabel on each item and not inherited? DCAP-33 - Getting issue details... STATUS

Question: intendedRepresentation - should that be in the ResponseDomain and then inherited by the others Domain types? We put it in the base ResponseDomain. DCAP-34 - Getting issue details... STATUS

Question: Do we need classificationLevel as a property in the NumericResponseDomain? DCAP-35 - Getting issue details... STATUS

Issue -

  1. Need to add OutParameters to the response domain when it's complete DCAP-36 - Getting issue details... STATUS
  2. NumberRange does not need to be annotated or identifiable. If you agree, then we will remove it as a relationship and make it a property. DCAP-37 - Getting issue details... STATUS
  3. How do we create enumerations in Drupal? i.e. classificationLevel 'nominal', 'ordinal', 'interval', etc. Do we even need these? DCAP-38 - Getting issue details... STATUS
  4. Please create controlled vocabulary for different units (SI categories?) Include wikipedia link. DCAP-39 - Getting issue details... STATUS
  5. Response Domains to be completed later:

    Check all that apply....boolean variable?
    Scale - started
    Ranking (put these in order)
    Raw Images (mimic the annotation of the files like architectural)
    Date 


Action Items:

What to do about multiple selection on code list response domain? DCAP-40 - Getting issue details... STATUS

Should we add intendedRepresentation to ResponseDomain DCAP-41 - Getting issue details... STATUS

What to do about conditional text with code list response domain?

Outline a plan for moving forward:

Review summary of Dagstuhl Sprint - Kelly

Set up a Data Capture phone call - Barry

Continue to work on JIRA issues (before or during EDDI?) Virtual sprint 


 Day Five Notes

Kelly, Wolfgang, Dan, Kerrin

Storing multiple values - five pieces of data as checked or unchecked or 5 variables with coded representation. In Lion you can't have a choice in the model, you use it using this kind of represented variable.

Issues:

  1. How do we manage missing representation? DCAP-42 - Getting issue details... STATUS
  2. Why do we have target cardinality when it can't be restricted in either the XML or RDF at least in the current version of Lion? DCAP-43 - Getting issue details... STATUS
  3. Wolfgang's question about category attached to booleanresponsedomain. DCAP-44 - Getting issue details... STATUS
  4. ScaleResponseDomain needs to be modeled. DCAP-45 - Getting issue details... STATUS
  5. RankingResponseDomain needs to be modeled. DCAP-47 - Getting issue details... STATUS
  6. RankingDomain - recommend creating a ranking group that ranking domains will link to. DCAP-46 - Getting issue details... STATUS  

Adding a conditional relationship to the CodeListResponseDomain called SpecifyOtherResponseDomain.

Attaching display label to response domain.

The codelist response domain allows for a (conditional relationship) and creation of a specifyotherresponse domain. The code list response domain can be called a number of times for the example of choosing multiple responses from one code list.

Adding category to the boolean (two valued, binary variable) response domain, for checked/unchecked responses. Only allowed to add one category. 

Added ScaleResponseDomain - operates like CodeListResponseDomain, calling out the different scales - still needs to be modeled.

Removed hasIntendedRepresentation from Capture because all response domains have it.

Added numericResponseDomains but did not put in the represented variables.