Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleJune 8, 2015

Simple Codebook
June 8, 2015

Present: Dan Gillman, Larry Hoyle, Jenny Linnerud, Oliver Hopt, Mary Vardigan

The group continued to review the spreadsheet mapping DDI 2.* to DDI4 and noting items that the modeling should take up.

Then the group turned to the metadata that the statistical packages include. Larry provided a spreadsheet that he and Achim had developed to show which metadata were included in each of the major statistical packages. It will be important for Codebook to contain all of this metadata. There are other ways of handling data, like SQL, that might also be appropriate. In the Big Data world, Python is becoming popular. Python  is a general scripting language and has replaced the role that PERL had at one point. You can explicitly represent trees like JSON and XML, so it is very flexible. People have developed modules that do statistical kinds of things with Python.

Looking at all the software metadata from the statistical point of view is important. We need to make sure that everything in Larry's spreadsheet is accounted for in a meaningful way. We need to identify things that are not in the DDI 2.* spreadsheet. We can go through this all together or do assignments.

Number of significant digits is important in some scientific data. Whether the number has been rounded can be important. This should be included in DDI4. In 11179 community, there was a discussion of accuracy and precision. This is related to significant digits. The Data Description Team should address this. In an Instance Variable we may want to talk about significant digits while for a Represented Variable we talk about accuracy. We don't want to lose simple statistics on variables.

Larry and Dan will talk with the Data Description and Modeling teams about these issues.

 

 

Expand
titleJune 22 2015

Simple Codebook Meeting
June 22, 2015

Present: Michelle Edwards, Dan Gillman, Larry Hoyle, Mary Vardigan

Larry and Achim developed a spreadsheet that shows the metadata that is included in all the major statistical packages. This should go up on the site.

The group began by talking about category groups in DDI Codebook. We now have the order relation mechanism in DDI 4 to handle hierarchies in categories. None of the CESSDA archives uses this in DDI Codebook. Do we need to map back to this? We have other ways of handling this in DDI 4 with classification schemes, etc., but it would be hard to map explicitly. Should we be deprecating some of these elements and attributes? We don't want to lose the notion of statistics in the Codebook View. In 4 there are summary statistics and category statistics that roll up to the instance variable. This is in complex data type and imported from 3.2. There is a Variable Statistic that belongs to a Statistical Summary which is attached to a physical instance. This has a reference to a variable and its payload in terms of what actual statistic content it has. It can be frequencies or aggregate summary statistics. We can represent the same content.

Are we providing a view that closely follows 2.5 or do we just want to map to 2.5? The mapping makes the most sense. If we do a copy we will mess up our model structure. There is no reuse in 2.5 and there is a way of thinking in 2.5 that may not match the thinking in 4. This approach could make tools more complicated, but ideally the tools using 2.5 will support the new way of doing it.

Right now there is a Simple Codebook package on Lion, but right now in the Simple Codebook view there is only Study Unit and Other Material. Is the Simple Codebook View intended to look like 2.5 or is it something new? We should create a new Codebook. What we need to do is how to migrate their 2.5 Codebook to a 4.0 Codebook. Most likely there will be a lot of people who choose to stay at 2.5. Tools will have to figure out how to map these things. Codebook 2.5 will need to be frozen and any changes to Codebook will be done in 4.0. To describe a process, you would need to convert 2.5 to 4 to harness the process. It will be incumbent on us to do this mapping, which we are doing.

We have several issues that are AG/Scientific Board issues:

  • Freezing 2.5 but it will be supported
  • Having a mapping
  • All new work will happen in 4 in the Codebook View

Tools for developing countries use DDI Codebook so this could be an issue.

We need to get the mapping as clear as we can before we give the spreadsheet to the modelers. We should provide this spreadsheet with more detail to the modelers – this is how you map 2.5 to 4.0 for the Codebook View. Then we have to work with the modelers to figure out what the Codebook View looks like using the spreadsheet as a guide. Then the people doing tools in 2.5 will have a way to translate. We should be able to export 4 into a 2.5 framework like an API and it should be readable. It's a binding called a coding to map attributes. This means that the community using 2.5 with the available tools should be able to read and interoperate with a Codebook developed under 4.0. We also need to address whether we maintain the ability to write 2.5 out of 4.0 even though we will be making updates to the Codebook View over time. We will have to version the views. Version 1 of the Codebook View will be equivalent to 2.5 but Version 2 will not unless this is a constraint that we want to include.