Meeting: 28 September 2015

Attendees: Dan Gillman (chair), Michelle Edwards, Oliver Hopt, Larry Hoyle, Steve McEachern


Oliver distributed a PDF of his thinking around the Codebook model. He presented this work, and provided commentary on his thinking. 


Scenario A was discussed at the last meeting, but was seen to be problematic. 


Scenario B was his revised approach. This includes:

 - Study, DataResource and DataFile

 - Citation from Annotation


DataResource is consistent with the GSIM equivalent

 - Carries Citation which allows various subclasses to be citable

 - Has one attribute: productionInformation

 - VariableBasket and DataFile would be subclasses of DataResource


Study includes:

 - StudyDesign

 - Fieldwork

 - Etc.

 - Study would have an attribute DataResource  


Comments and discussion


1. Dan asked the meaning of the blue box around DataResource in Scenario B? Oliver indicated that this would indicate a new package DataResource.

2. Dan asked what is the cardinality of the relationship b/w Study and DataResource? Oliver suggested that this should be repeatable - e.g. more than one DataFile in a Study.

3. Dan asked DataResource is currently a collection of files or a collection of Variables. Could this include Questions?

- Oliver noted that there is currently a relationship through Measure from Question to InstanceVariable.

- We may not want to include all of the DataCapture view within Codebook

- Dan suggests that DataCapture has not yet laid out the link between the Questionnaire in the abstract versus the Instrument in the physical.

- We would want to include the Questions, Skips, ResponseCategories and InterviewerInstructions.

- Which do we want - the PhysicalInstrument or the ConceptualInstrument?

- Examples: Blood Pressure measurement, CATI instrument execution

- By including Physical, do we as a result account for Conceptual?

- Larry asks can we include by reference? Dan argues for the need for explicit rather than implicit reference. Larry notes that this means that this would make an Instrument required content.

- Dan asks if it is adequate to have just a pointer? If so, how do you link the Variable to the Question?

- Dan suggests that there IS a link between a Question and a Variable - but it is just not enough to tell you sufficient detail as to how a Datum was derived.

The group generally wasn't sure if we do want to try and link the Question and Variable - mostly due to content already existing (particularly pre-2000).


Oliver brought the conversation back to what we are currently trying to model.


Preferably there should be some machine actionable generated documentation which allows the links between these to be automatically (or semi-automatically) created. However in many cases this simply may not be available for past content (ADA and GESIS have examples, and we believe ICPSR as well).


As such, we may want to allow for simple external documents which describe the content in a human-readable (but not machine readable or actionable) form.

External resource is an option in Lifecycle - this might be the means for this.


Oliver's current model does enable this - allows for the simple, but allowing to be replaced by more complex where it is available and/or "generateable".

Steve noted that this would also be consistent with the approach taken in Methodology.


Where does this leave us, and where to next?


Dan is concerned that we may be adding a fair amount of complexity over DDI version 2.5.

e.g We have been having discussions about the link between Question and Variable - how would the user community respond to this?


Oliver also noted that this may touch on the discussion had with Ornulf about maintaining Codebook 2.5 through the DDI4 implementation. Ornulf's and Oliver's concern was the potential creation about too many identifiers to be maintained within a Codebook instance. Whether we would be able to handle what's done in 2.5 in a DDI4 codebook.


Larry noted that Colectica seem to have a potential solution to this in their current work. This seems to bypass the Lifecycle 3.2 approach, and simply use UUIDs to manage identifiability, which might be a possible solution.


What to do for next meeting?


Oliver undertook to clarify what the relationships between his Study object and the other packages would be (e.g. to DataCapture, Methodology, etc.).

We also need to ensure that we keep track of what the requirements are for aligning with DDI2.5


Next meeting: Monday 12th October, 8am U.S. Eastern time

Note that there will be changes for other locations due to daylight savings.