Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleFebruary 16, 2015

Simple Codebook Meeting
February 16, 2015

Present: Dan Gillman, Larry Hoyle, Steve McEachern, Mary Vardigan

Completeness of cross walk between 2 and 3

It is essentially one-way from 2 to 3. Codebook doesn't have the reusability that Lifecycle does. This is the same issue as between SPSS and Stata/SAS. We should look at the mapping.

Content and functionality of Simple Codebook

We want to make sure that Simple Codebook lets us write or ingest 2.x fairly seamlessly. Are the same kinds of element names available in 3? The names change even at the highest level.

Many miss the Tag Library as it was so simple. This kind of resource would be useful along with a mapping. However, Wendy advises that we don't have to worry about 2 since the mapping is there.

Even 2 has a lot of content. Are we still talking about a simple codebook as opposed to a complex codebook? Simple should allow you to take information from a major statistical package and move to another without losing any information (this is our definition of simple) . In terms of question, that should be included as should sampling and universe. We should review DDI Lite and DDI Core, which have not been updated to the most recent versions of Codebook and Lifecycle. This may enable us to have a framework for content. We will deal with functionality later.

We make the assumption that we have the instrument information and the data description information from those two views. What else do we need? Context information or study level – Universe, sampling, design, bibliographic information. Citation, Study information, which is discovery related, methodology, and access. Does access below?

What do you need to know to use the data? You need the variable information. Question order and the way questions are asked may be important.

There is a tension between being very simple and following best practice for good documentation. Can we add pointers to relevant information? The simple/complex distinction is levels of detail.

For secondary users, we need enough information for a researcher to be able to understand and evaluate the quality of a dataset without reference back to the original data producer and to pull it into a statistical package.

Take common set of CESSDA, ICPSR, and IHSN mandatory schemas, and figure out what is the superset?

Necessary: variables and questions and layout; universe or population; level of geography (basically coverage); sampling; or weights (and point to thorough description of sampling)

Distinction between simple and complex for data description is between simple rectangular file and other data types; this applies to codebook in some ways as well. Is there a cascading effect if we limit ourselves to simple rectangular files, we limit ourselves (we should describe hierarchical files as well like CPS). If we are describing the files themselves, you can describe qualitative files as objects with the existing DDI. You can have hierarchical in CSV with a record type field but historically we have had files with physical representations of the data.

For a simple codebook, the simple representation needs to be limited to unicode or something like that.

Homework: review DDI Core: http://www.ddialliance.org/sites/default/files/ddi3/DDI3_CR3_Core.xml and DDI Lite: http://www.ddialliance.org/sites/default/files/ddi-lite.html

And think about what limitations we want to put on format to keep the idea of simple codebook but keep it rich enough so we are covering enough situations.