Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Expand
titleJune 22 2015

Simple Codebook Meeting
June 22, 2015

Present: Michelle Edwards, Dan Gillman, Larry Hoyle, Mary Vardigan

Larry and Achim developed a spreadsheet that shows the metadata that is included in all the major statistical packages. This should go up on the site.

The group began by talking about category groups in DDI Codebook. We now have the order relation mechanism in DDI 4 to handle hierarchies in categories. None of the CESSDA archives uses this in DDI Codebook. Do we need to map back to this? We have other ways of handling this in DDI 4 with classification schemes, etc., but it would be hard to map explicitly. Should we be deprecating some of these elements and attributes? We don't want to lose the notion of statistics in the Codebook View. In 4 there are summary statistics and category statistics that roll up to the instance variable. This is in complex data type and imported from 3.2. There is a Variable Statistic that belongs to a Statistical Summary which is attached to a physical instance. This has a reference to a variable and its payload in terms of what actual statistic content it has. It can be frequencies or aggregate summary statistics. We can represent the same content.

Are we providing a view that closely follows 2.5 or do we just want to map to 2.5? The mapping makes the most sense. If we do a copy we will mess up our model structure. There is no reuse in 2.5 and there is a way of thinking in 2.5 that may not match the thinking in 4. This approach could make tools more complicated, but ideally the tools using 2.5 will support the new way of doing it.

Right now there is a Simple Codebook package on Lion, but right now in the Simple Codebook view there is only Study Unit and Other Material. Is the Simple Codebook View intended to look like 2.5 or is it something new? We should create a new Codebook. What we need to do is how to migrate their 2.5 Codebook to a 4.0 Codebook. Most likely there will be a lot of people who choose to stay at 2.5. Tools will have to figure out how to map these things. Codebook 2.5 will need to be frozen and any changes to Codebook will be done in 4.0. To describe a process, you would need to convert 2.5 to 4 to harness the process. It will be incumbent on us to do this mapping, which we are doing.

We have several issues that are AG/Scientific Board issues:

  • Freezing 2.5 but it will be supported
  • Having a mapping
  • All new work will happen in 4 in the Codebook View

Tools for developing countries use DDI Codebook so this could be an issue.

We need to get the mapping as clear as we can before we give the spreadsheet to the modelers. We should provide this spreadsheet with more detail to the modelers – this is how you map 2.5 to 4.0 for the Codebook View. Then we have to work with the modelers to figure out what the Codebook View looks like using the spreadsheet as a guide. Then the people doing tools in 2.5 will have a way to translate. We should be able to export 4 into a 2.5 framework like an API and it should be readable. It's a binding called a coding to map attributes. This means that the community using 2.5 with the available tools should be able to read and interoperate with a Codebook developed under 4.0. We also need to address whether we maintain the ability to write 2.5 out of 4.0 even though we will be making updates to the Codebook View over time. We will have to version the views. Version 1 of the Codebook View will be equivalent to 2.5 but Version 2 will not unless this is a constraint that we want to include.

 

Expand
titleJuly 6 2015

Simple Codebook Meeting
July 6, 2015

Present: Dan Gillman, Larry Hoyle, Jenny Linnerud, Mary Vardigan

DDI 2.5 and DDI 4

Do we bring anything forward to the AG or go directly to the modelers? In terms of how we go through the spreadsheet again, are we asking for changes or is it more informational? At the AG meeting, when we discussed the issues we talked about in this meeting in terms of freezing 2.5 and doing everything within DDI4, we met with some resistance. We don't want to announce that we are freezing 2.5 until we have to. But the basic thrust of what we want to do (manage everything in 4) doesn't seem to be that controversial, but we have to have that in place to move forward with what we are doing. If we put forward the set of requirements, it won't make sense till we have an agreement that this way makes sense. We are saying: this is what we have to have in 4 for us to be able to handle 2.5 and handle further refinements of 2.5 from within 4. Can we get everyone to agree that we want to maintain all the attributes of 2.5 in 4 and not have two separate management activities going on? We want to maintain at least all new things in 4. Right now 2.5 is in XML but there is no reason we can't bind it to RDF.

Relation to CSPA and GSIM

We have a sales job here. The modelers' way of doing a binding may force them into a certain way of describing objects. As long as you have everything you need in 4 to map to 2.5, you should be able to write a binding. The bindings should not drive the design.

The CSPA LIM (Logical Information Model) was undertaken partly because the DDI was not delivering as fast as desired for the NSIs. Now we need to make sure that DDI and the LIM stay aligned so that we are conformant to GSIM. DDI should be a profile of GSIM and it should instantiate processes as GSIM does. GSIM is the more high level, abstract version of what DDI is becoming. We are filling in the details of what GSIM leaves to the implementer for DDI and it reduces the amount of variability in the implementations.

LIM is supposed to be halfway between GSIM and a physical implementation. So far the LIM covers codelists and the next step is statistical classifications.

We don't want to see another standard with small differences we need to bridge.

DDI Codebook and Moving to 4

The perception of Lifecycle is that it has added complexity that people don't want to deal with. Some of the complexity comes from reuse. We may have some issues in terms of whether we can actually model 2.5 in the codebook view of 4. There is a lot of stuff that you may have to bring along that ends up complicating things. But if attributes are what we really care about (combination of class and property – could be a relationship), we are totally flattening out the model into a set of these attributes and taking what we need. In terms of identification, we need to figure out what the requirements are in 2.5 and make sure there are attributes in 4 that handle all of those things in 2. If we have the flat model view of Codebook as not a view in the strict sense but essentially just a SGL dump of attributes out of 4, can we produce 2 from that? This is what we need to be able to show. This is how we need to present 2. As a group, we need to go down into the identification area and figure out how to map to 4. The binding doesn't have to take into account all the relationships that exist among all the classes – it is simply a dump of all the attributes.

We need to gather more evidence among our group. Once we resolve this, we can answer all the concerns. This requires a different way of thinking. We should be able to automatically say: we want the following attributes and write them out. There will be an issue of whether an instance of 2 can be ingested into 4 and make sense. DDI 2 does not indicate that code schemes are the same.

Next Meeting

We will go back through the spreadsheet and make sure we have everything and are ready to send things to the modelers and then start to look at the IDs.