Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleFebruary 16, 2015

Simple Codebook Meeting
February 16, 2015

Present: Dan Gillman, Larry Hoyle, Steve McEachern, Mary Vardigan

Completeness of cross walk between 2 and 3

The crosswalk or mapping is essentially one-way from 2 to 3. Codebook doesn't have the reusability that Lifecycle does. This is the same issue as between SPSS and Stata/SAS. We should look at the mapping in more detail.

Content and functionality of Simple Codebook

We want to make sure that Simple Codebook lets us write or ingest 2.x fairly seamlessly. Are the same kinds of element names available in 3? The names change even at the highest level.

Many miss the Tag Library as it was so simple. This kind of resource would be useful along with a mapping. However, Wendy advises that we don't have to worry about 2 since the mapping is there.

Even 2 has a lot of content. Are we still talking about a simple codebook as opposed to a complex codebook? Simple should allow you to take information from a major statistical package and move to another without losing any information (this is our definition of simple) . In terms of questions, they should be included as should sampling and universe. We should review DDI Lite and DDI Core, which have not been updated to the most recent versions of Codebook and Lifecycle. This may enable us to have a framework for content. We will deal with functionality later.

We have been making the assumption that we have the Instrument information and the Data Description information from those two views. What else do we need? We need context information or study level – Universe, sampling, design, bibliographic information. In DDI 2.* we have Citation, Study information (which is discovery related), Methodology, and Access. This is good content.

What do you need to know to use the data? You need the variable information. Question order and the way questions are asked may be important.

There is a tension between being very simple and following best practice for good documentation. Can we add pointers to relevant information? The simple/complex distinction is levels of detail.

For secondary users, we need enough information for a researcher to be able to understand and evaluate the quality of a dataset without reference back to the original data producer. We also need enough information to pull the data into a statistical package.

We started an exercise to take the common set of CESSDA, ICPSR, and IHSN mandatory schemas, and figure out what is the superset. The spreadsheet can be found in the attachments on the page: Simple Codebook View Team. We should compare this set of elements to what is in DDI Lite and DDI Core.

Necessary for a simple codebook: variables and questions and layout; universe or population; level of geography (basically coverage, including temporal and subject); sampling; or weights (and point to thorough description of sampling).

The distinction between simple and complex for data description is between a simple rectangular file and other data types; this applies to codebook in some ways as well. But there may be a cascading effect if we limit ourselves to simple rectangular files (we should describe hierarchical files as well like CPS). You can have hierarchical data in CSV with a record type field but historically we have had files with physical representations of the data that are esoteric. How much of this do we need to handle? For a simple codebook, the simple representation should be limited to unicode or something like that.

Homework: review DDI Core: http://www.ddialliance.org/sites/default/files/ddi3/DDI3_CR3_Core.xml and DDI Lite: http://www.ddialliance.org/sites/default/files/ddi-lite.html

And think about what limitations we want to put on format to keep the idea of simple codebook but to keep it rich enough so we are covering enough situations.

The next meeting will be in two weeks on March 2.

Expand
titleMarch 2 2015

Simple Codebook Meeting
March 2, 2015

Present: Michelle Edwards, Dan Gillman, Oliver Hopt, Larry Hoyle, Steve McEachern, Mary Vardigan

This The group welcomed Michelle Edwards of CISER. The chair noted that this group is in a sense waiting for other groups (Discovery, Data Description, Instrument) to complete what they are doing so that we can finish our work. We recognize a need to try to incorporate the idea that we should be able to combine to  incorporate both Codebook and Lifecycle into one spec (DDI 4), so we have been exploring that in our group a bit.

DDI Lite was reviewed and compared with the element sets that ICPSR, GESIS, and IHSN use and they are a fairly good match.

We won't be able to exactly duplicate Codebook and Lifecycle as views of DDI 4 but we can get close. Organizations that have invested in 3.2 do not want to lose that investment. Can we map 3.2 to 4 by automatically importing what's in 3.2? We may need a conversation with Guillaume about this. This should probably be at the Advisory Group level.

DDI Codebook and Lifecycle have different names for the same element. We will need mappings for people.

What we write out is also important. Interoperability can be defined in terms of reading and writing out of a system. If we can read 2.5 into 4, we are able to ingest anything that occurs anywhere under 2.5. We want to be able to write an instance that contains all the semantic content of Codebook. If we know that there is an equivalence we should have a 2.5 writer to write it out in that name. It is the structure and the mappings that matter.

There were changes between Codebook and Lifecycle that were not necessarily clean because of the use of things by reference in 3 (categories and codes). Upward compatibility may be tougher than downward compatibility. We should probably not worry about 3 here but concern ourselves with mapping 2.5 to into 4.

Is Codebook still an aggregation of Discovery, Description, and Instrument? Right now Discovery is a stripped down element set.

We could start with 2.5 as a starting point and we need to be able to account for this. Then we could look at 4 and ask whether the following things are everything is covered. Can we restrict this to 2.5 Lite? Generally, yes.

A Codebook view would be intended for an audience that is creating or managing codebooks and it doesn't matter what things are in other views or packages.

Views can overlap as much as you want. DDI Lite is a view. DDI 2.5 is a view. We are leveraging the experience of all these repositories repositories (ICPSR, GESIS, IHSN) in serving up data, so that makes a good codebook. Use It makes sense to rely on DDI Lite, which we know is used.

The group reviewed the elements in DDI Lite. ADA uses a few other elements like deposit date, alternative title, collection situation, etc.. ADA uses the default Nesstar template which is close to DDI Lite. We should look at Nesstar also. The CESSDA Profile would be the best thing to use.  We need to identify where things are already defined in 4 and where things still need to be defined in 4. We need to know what is missing from 4 in order to have a sense of where we stand. Our group could then go to the AG to say what needs to be addressed in sprints.

If we have something in 4 that maps to Nesstar/CESSDA profile, that allows a big chunk of DDI users to adopt 4. There is another migration path we can look at: we have 2.5 codebook - is there a more modern one? Migrate 2.5 to something different? This may be out of scope for our group but we should discuss it.