Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleFebruary 02 2015

Simple Codebook Meeting Minutes
February 2, 2015

 

Present: Dan Gillman, Oliver Hopt, Larry Hoyle, Steve McEachern, Mary Vardigan

The Simple Codebook committee will now be chaired by Dan Gillman as Wolfgang is not able to chair currently.

This group has been in a holding pattern because we are waiting on the results of other groups. However, it was suggested that we look at the Codebook 2.5 in comparison of DDI 3.*.

XML permits a detailed description of elements and this is part of the distinction between 2 and 3. But UML doesn't allow this and doesn't account for nesting and levels of detail. We should try to incorporate what is in Version 2 into the model as best we can. We as a group should try to build this. One additional possible other advance would be that we could then have a single model to account for both Codebook and Lifecycle. Both views would be under one spec in this approach.

Is referencing and reusability a distinction between the two versions that we should take into account? Should it be communicated to the modeling team that we may not need the complexity?

For users who want to describe their data, they should be able to write a description and fit it into a framework. If you want to have interoperability with other systems, then it is an issue.

For the standalone one-off research project, they will not be reusing variables and questions, but for longitudinal and research across languages and cultures, this is important, harmonize across questionnaires, reuse across time, etc. Maybe this is Complex Codebook?

We need a distinction between the user perspective and the technical perspective. Simple and complex need to be interoperable. It's necessary to reduce the complexity of what is modeled in the library by choosing the simple cases.

One of the decisions for DDI 4 is to make everything identifiable and drop the container aspect of identifiability. This takes away a lot of the complexity.

From a marketing perspective, we need to distinguish between DDI Codebook version and Simple Codebook view. Looking at what is in 2 now will be required and we need to lay out what we need to account for. In the study section for DDI Codebook, there were a number of elements that allowed you to provide a high level text description of various methodological things. Preserving that is important.

Capturing what is in an SPSS or SAS representation including all the metadata you can put there is important. When you move data around you don't want to lose anything. When you look at how researchers want to record information, it is often difficult for them to record things in detail. Guided structures for them as part of their workflow is important and this is one view that could help them with this. You need some structure that becomes machine-actionable. You don't want people to just write a narrative.

At BLS, there is a Handbook of Methods. It has narrative descriptions of the surveys BLS does and it doesn't have a lot of detail. This should be captured in DDI rather than in a PDF. There is a need for high level and detailed as well. There may still be a need for a DDI Lite as a way of inducing reluctant data producers to get involved. For variables the detail is necessary. We should make this as flexible as we can.

We can start by looking at what is in 2.5 and figure out from the point of view of a list of what we need to account for. This would be a set of requirements that we as a group need to figure out how to solve. One question we want to address from a modeling point of view is, for example, we need to say how the sample is constructed: Would those higher level descriptions go in a class of things that are independent of everything else or part of a sampling class? These are design issues that might have an impact on the way the more detailed model plays out.

If we can manage both 2 and 3 in the same structure we as a standards body will have an easier time with this. We should consult with Wendy on this.

Several archives still rely on DDI Codebook, Nesstar, etc. There is a set of codebook specs from different archives.

Are we talking about having our Simple Codebook view covering everything that is in 2.5? It should be even less. But should there be a view that is everything in 2.5? One idea is a view that is a really simple codebook but to allow for complexity in any direction you would like to go so we could incorporate everything that is in 2.5. Or go into more detail in 3 for whatever direction you want to go so there is a seamless distinction between high and detailed levels. This is DDI 4. We should provide a lot of different options about how much detail the user wants. With 4 right now we have detailed descriptions of a lot of things but we are not allowing for high level descriptions. The description and definition were discussed in London with respect to Drupal so there could be radio buttons to indicate that they should be used to standardize those objects. It could be possible to have a description without any usage of detailed sub-elements.

 There could be an attribute that could be high-level description. Or we have an element saying this is the Sample Description. Just having an element called description associated with identifiable objects may not be sufficient. In the annotated identifiable there is an annotation element that has Dublin Core properties like Title, Contributor, etc. It has an abstract. But there is nothing that is a high-level description.

On the one hand it might be nice to have a Sampling Description, but it might be over-specified. It's important to have an element dedicated to a high-level description that you are offering in place of the detail or as a supplement to the detail. A general description like the annotation will lose semantic interoperability. We need machine-interpretability. We also want the possibility to reference just the high level description in the simple codebook.

We should be able to allow for user-defined views that provides for whatever level of detail an organization uses. A Simple Codebook view that maps back to 2.5 would be useful. It would allow those organizations just using 2 to feel comfortable using 4.

DDI 4 does not have the same hierarchy as DDI 3. We would still need an object carrying high level content for the sampling process and nothing else. In 3 there was a parent node but we don't have this structure in DDI 4, which means you need to create a container for this description. It's not a question of using description as a property containing the text, but which element carries the description.

Between now and the next meeting, Oliver will make some slides with an example of what we have been talking about. We also need to dig into DDI 2.5 to get a handle of what is needed at the higher level. Dan and Larry will look at this. Dan will also consult with Wendy on this.