Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
iconfalse

 Simple Codebook View Team

 

Expand
title5 July 2016 meeting minutes

Meeting Minutes, 5 July 2016

Attendees: Dan Gillman, Michelle Edwards, Larry Hoyle, Gillian Kerr, Steve McEachern

The meeting focussed on reviewed the output of the previous meeting (see minutes below) for those who were unable to attend previously.

As noted in the previous minutes, there are three core areas of decisions to be made by this group to progress for the Q3 release. Each are considered below.

 

Decisions for Modelling group

1. Optional vs mandatory content

It appears to be that focus will be on making everything optional. Wendy Thomas has circulated a document on this through the SRG.

Larry commented that we need to consider what content is being imported into the Codebook view when we import the packages that we rely on - particularly those classes that are not appropriate or relevant.

Gillian also noted on mandatory content that if it IS mandatory, then people will often include nonsense content to comply with the field requirements - REDUCING data quality. Suggested that content might be better managed by making it optional, and then enabling links to reference content (e.g. ORCID for author/investigator information). Dan noted that content could be supported by VALUE or by REFERENCE - which would be one means of enabling this.

2. Citations

Should we retain the text string citation, or just use constructed citations In Norway, the preference was to remove the text string. However there are cases where there may be recommended text that a data producer requires.

3. Access conditions
There are a number of usecases that require managed access conditions. We don't yet have this on the work program (AG to consider) or a model to support access conditions (TC/Mod to consider). Dan suggested this might be added onto the AnnotatedIdentifiable class? Suggested that Codebook group develop an approach and propose to Modelling/AG for broader usage.

Decisions for the SimpleCodebook group (or for referral by SC to other teams)

1. Geographic polygons

Why opt these out? Can include by reference?

2. Variable metadata

There is content from 2.5 that is basically describing fixed width files. If we want to include something along these lines (and this was agreed by consensus from the Codebook group - although we may improve upon the current handling within 2.5) then the DataDescription group needs to address this. (Steve to include in next DD agenda)

Larry noted that we probably should include the relevant PhysicalLayout classes and attributes (contributes :-)

3. ResponseUnit and AnalysisUnit

These are rather mixed up in Codebook. Dan suggested referring back to the Unit/UnitType/Population/Universe content in the Conceptual package. But we need to carefully specify the relationships.

Going forward:

- Draw on the existing content on Access Conditions and the available classes from the Data Description
- There is content that will be incomplete in the Codebook Q3 release - need to recognise this
- Also want to consider the extent to which we improve past content (e.g V2.5 methodology)

To do:
- Steve to raise Variable metadata (and FixedWidth content) with DataDescription group
- Modelling/AG need to consider AccessConditions (Codebook potentially to provide a solution)
- Unit/ResponseUnit/AnalysisUnit: may need to be postponed

Expand
title21 June 2016

Minutes of Simple Codebook group, Tuesday June 21, 2016

Attendees: Steve McEachern, Larry Hoyle, Oliver Hopt

Review of the output from Norway continued. The group focussed on how to resolve the outstanding fields identified in the DDI-C profile (from the Google spreadsheet here:

https://docs.google.com/spreadsheets/d/1VDbVz2KRRSX_KEf0IfuE-QqMyTDupftCZfBdBM6VPT8/edit#gid=1652443366).

The proposal was for the remaining content to be addressed through three mechanisms:
- Referral to AG/Modelling group for "general approach" matters (such as Citation)
- Specific issues for the team to resolve (or to be addressed by related teams including DataDescription)
- Content that needed to be deferred due to dependencies on future activities of current working groups (Methodology and Physical Layout).

Details of each are below.

Discussions for modelling and/or advisory group:
1. Optional vs. mandatory content
2. Citations: text citations (e.g. Bibliographic Citation) vs. constructed/compiled/generated citation (from constituent parts)
- (also need to account for required citation text from data producers)
- Dublin Core: BibCit is one of the DCMI terms (but not the core 15 terms)
3. Access conditions:
- Whole datasets (DDI-C, DDI-L profiles)
- Variables within datasets (DDI-C, DDI-L profiles)
- Units within datasets
- Cases (records?) within datasets
- Metadata (e.g. Census RDC content restricts information on variable metadata)
AND
- What content is required within the access conditions (there was a model mentioned that may be a candidate)
- Variable Security and Variable Embargo (from DDI2.5)

SC team (or related teams) to resolve - Additional questions/fields outstanding:
1. Geographic Polygons

2. Variable metadata:
- VariableFiles: (files that contain this variable??) - probably covered by a DDI4 Relationship - recommend deferring this if needed (as may be part of future DataDescription model development)
- VariableInterval: continuous or discrete

3. ResponseUnit and AnalysisUnit (and Unit of Measurement)
- Consider a situation where the respondent to a survey is a Parent but the unit of interest is the Child - and the unit of analysis might be either the Child or the Household?? How do we describe these different "units"
- In particular, "AnalysisUnit" is problematic - because the unit of analysis is dependent on the research use - not on the data as captured.
- Might be related to Viewpoint??

Variable content characteristics may be best addressed now:

- VariableInterval,
- 3.2 dimensions such format, scale, decimalPositions, ... - numeric representation, classification level, ... (3.2 ties this more closely to the data type). Fundamentally these are attributes of the DATA TYPE and the MEASUREMENT
- Also consider the SummaryStatistics (DDI-C 2.5) in this discussion (note that this is probably more a characteristic of the "set of datums" rather than the InstanceVariable)
- Should be addressed by DataDescription to make a recommendation on when these attributes will be incorporated
- Larry Hoyle recommends including PRECISION as an attribute of the measurement
- What other content is commonly available from statistical packages. Reference Hoyle and Wackerow paper in IASSIST Quarterly. V39 N.3-4

Recommended for deferral
1. Methodology - all related fields
 

- All fields within Methodology section of DDI-C

- Also includes Imputation

 This requires output of Methodology team

2. PhysicalLayout
- MissingData
- VariableLocationStart-End-... (i.e. location in FixedWidthFiles)
This content requires the fixed width layout from the DataDescription group - which may not be included in the initial DD preview release.

 

...

Expand
titleFebruary 2, 2016

Codebook meeting

2 February 2016

Attending: Dan, Michelle, Steve, Oliver, Jon, Larry, Jared

There’s some lack of clarity about where this group is at.  Discussed what to include in simple codebooks.  One idea is to review the spreadsheet of common elements (summary of CESSDA) and build on that.  Essentials seem to include: enough information to read the data into statistical package, label values, understand universe, understand what measure means so you can interpret the data, attribution information.  Another idea is to look at examples of simple codebooks, identify what they use, and then map to a model.

We need to be careful to keep things simple.  Even older versions of DDI 2 weren’t exactly simple.

If we nail down definitions, then do we make instances of previous versions incompatible?  As we define what information elements we want in DDI 4.0, we can specify which element you want in 2 if you’re going backwards.  

Next steps:

  1. Michelle will go through spreadsheet and narrow down to those elements that are DDI Lite and any others that are heavily used (e.g., key words).

  2. Will paste those elements into new sheet within the spreadsheet.

...