Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleMeeting Minutes Sept 28 2015

Attendees: Dan Gillman (chair), Michelle Edwards, Oliver Hopt, Larry Hoyle, Steve McEachern

 

Oliver distributed a PDF of his thinking around the Codebook model. He presented this work, and provided commentary on his thinking. 

Scenario A was discussed at the last meeting, but was seen to be problematic. 

Scenario B was his revised approach. This includes:

 - Study, DataResource and DataFile

 - Citation from Annotation

 

DataResource is consistent with the GSIM equivalent

 - Carries Citation which allows various subclasses to be citable

 - Has one attribute: productionInformation

 - VariableBasket and DataFile would be subclasses of DataResource

 

Study includes:

 - StudyDesign

 - Fieldwork

 - Etc.

 - Study would have an attribute DataResource  

 

Comments and discussion

 

1. Dan asked the meaning of the blue box around DataResource in Scenario B? Oliver indicated that this would indicate a new package DataResource.

2. Dan asked what is the cardinality of the relationship b/w Study and DataResource? Oliver suggested that this should be repeatable - e.g. more than one DataFile in a Study.

3. Dan asked DataResource is currently a collection of files or a collection of Variables. Could this include Questions?

- Oliver noted that there is currently a relationship through Measure from Question to InstanceVariable.

- We may not want to include all of the DataCapture view within Codebook

- Dan suggests that DataCapture has not yet laid out the link between the Questionnaire in the abstract versus the Instrument in the physical.

- We would want to include the Questions, Skips, ResponseCategories and InterviewerInstructions.

- Which do we want - the PhysicalInstrument or the ConceptualInstrument?

- Examples: Blood Pressure measurement, CATI instrument execution

- By including Physical, do we as a result account for Conceptual?

- Larry asks can we include by reference? Dan argues for the need for explicit rather than implicit reference. Larry notes that this means that this would make an Instrument required content.

- Dan asks if it is adequate to have just a pointer? If so, how do you link the Variable to the Question?

- Dan suggests that there IS a link between a Question and a Variable - but it is just not enough to tell you sufficient detail as to how a Datum was derived.

The group generally wasn't sure if we do want to try and link the Question and Variable - mostly due to content already existing (particularly pre-2000).

 

Oliver brought the conversation back to what we are currently trying to model.

 

Preferably there should be some machine actionable generated documentation which allows the links between these to be automatically (or semi-automatically) created. However in many cases this simply may not be available for past content (ADA and GESIS have examples, and we believe ICPSR as well).

 

As such, we may want to allow for simple external documents which describe the content in a human-readable (but not machine readable or actionable) form.

External resource is an option in Lifecycle - this might be the means for this.

 

Oliver's current model does enable this - allows for the simple, but allowing to be replaced by more complex where it is available and/or "generateable".

Steve noted that this would also be consistent with the approach taken in Methodology.

 

Where does this leave us, and where to next?

 

Dan is concerned that we may be adding a fair amount of complexity over DDI version 2.5.

e.g We have been having discussions about the link between Question and Variable - how would the user community respond to this?

 

Oliver also noted that this may touch on the discussion had with Ornulf about maintaining Codebook 2.5 through the DDI4 implementation. Ornulf's and Oliver's concern was the potential creation about too many identifiers to be maintained within a Codebook instance. Whether we would be able to handle what's done in 2.5 in a DDI4 codebook.

 

Larry noted that Colectica seem to have a potential solution to this in their current work. This seems to bypass the Lifecycle 3.2 approach, and simply use UUIDs to manage identifiability, which might be a possible solution.

 

What to do for next meeting?

 

Oliver undertook to clarify what the relationships between his Study object and the other packages would be (e.g. to DataCapture, Methodology, etc.).

We also need to ensure that we keep track of what the requirements are for aligning with DDI2.5

 

Next meeting: Monday 12th October, 8am U.S. Eastern time

Note that there will be changes for other locations due to daylight savings.

 

 

Expand
titleNovember 9, 2015

November 9, 2015

Present: Dan Gillman, Oliver Hopt, Larry Hoyle, Mary Vardigan

The group discussed whether Data Capture had made enough progress to enable Codebook to move forward. Mary will get in touch with Barry about this.

In terms of Oliver's model (the second model he proposed), the next step would be to bring in information from other groups. Access conditions was the only area not yet covered. We need to ensure that everything in Oliver's model is covered (except for Access Conditions). Oliver will go through the group's spreadsheet and map to this model to ensure full coverage.

We also need to ensure that we have adequate methodology information. We also need to be sure that full file level documentation is enabled (not just study level).

And do we want to include all of the datum level information for reuse? This may be too much for the codebook view, which has traditionally been a more flat view of a study and the files it produces. There is a connection between variable and datum so if we want this to be part of codebook or an extended version it is possible.

Do we care about anything other than the instance variables in Codebook? Codebook is something you get with a file that lets you use it and interpret it. But if you have pointers to represented variable and conceptual variables you can do more.

Since codebooks are created ad hoc, that's how it's designed. There is no guarantee that the way someone creates a conceptual variable is the same as how someone else creates it. There would be no semantic interoperability. But in a future world by design there are new surveys where comparability is designed into newer surveys. A DOI to what has been defined elsewhere would be OK.

We have polled various organizations to see which elements they use. Do we need to continue not-used elements? This is a good point in time to simplify. To survey on DDI 3 usage, Oliver has a small XSL transformation that gives out a statistic of downward paths for any given document, which could be helpful.

In Data Description, there was a related discussion about how far we should chase legacy file layouts. In one sense you want to encourage people to do things in simpler ways, rather than more complicated formats.

It was decided that the ability to include references to represented and conceptual variables is a good addition to codebook to bring in the notion of reuse.

 

 

 

 

 

 

 

Expand
titleNovember 23, 2015

November 23, 2015

Present: Dan Gillman, Michelle Edwards, Steve McEachern, Larry Hoyle

  • We want to incorporate everything from the InstanceVaraible
  • Add in the connection to Question
  • Structure of the physical representation
  • We want to describe DDI-C using DDI4 elements.
  • Reuse would make some codebook instances shorter, Will people think that referencing RepresentedVariable and ConceptualVariable is required if those references are optional?
  •  What are the next directions for Codebook?  Think about surveying big Codebooks users, IHSN and Nesstar users in particular – along with 5-6 archives 
  • Where Nesstar goes these users will follow
  • Cost will be primary driver for folks to migrate from 2.x to 4.x – some see the benefits of the DDI-L extensions
  • We need a migration path from 2.x to 4.x
  • 4.x is flexible enough that the migration path doesn’t need to be well defined
    • Should be based on your needs and what you think is appropriate first step to reuse
  • Variable bank, question bank, and Universes/Populations may be the natural first step to migration but each may present a different migration path
  • We may be able to recommend different paths
  • ISO Community – have technical reports – series of recommendation that folks ought to follow – think of it as “Best Practices” – these may exist but they do not depict how to but rather provide guidance
  • This is something we should seriously consider doing – maybe a Grad Student project
    • Jane Greenberg, at Drexel University – great opportunity to collaborate
    • Dan G may reach out to Jane to start a conversation
  • Back to how are we going to build codebook
  • We want to create a model-based Codebook in 4.x rather than a way to create the XML from 4.x to put into 2.x

 

    • This way we can do things more efficiently
    • Create an attribute that states it is being used for Purpose A or Purpose B
  • We could document how the information could be transferred without having a one-to-one relationship between objects.
  • To implement codebook in 4.x we need to describe attributes and their purpose
    • Examples:
    • Title / Alternate title / Parallel title -> have an attribute with a Controlled Vocabulary for what kind of title it is
    • Similar situation for Roles – in Codebook we have a number of different roles, let’s pair that down, use Agent, with a CV and a usage attribute that states Codebook - Roles – we recommended the Credit Taxonomy.
  • 1 object that covers a number of Codebook XML elements
  • Compactness will make it easier to maintain over the years – these could include these areas:
    • Citations
    • Publications
    • Related Materials
    • Methodology
  • Cluster elements?

Goal for next Meeting – December 7, 2015:

  • Review Codebook and see how we can handle current Codebook elements
    • Clusters that can stand on their own – then figure out how we can do this
    • What we need and how to manage it – then take to modellers
  • Going forward – we will review and  look at clustering elements in the Google spreadsheet. What are different uses of the same structure?

 

 

 

...