Info | ||
---|---|---|
| ||
Expand | ||
---|---|---|
| ||
Codebook meeting 2016-05-10 Attending: Dan Gillman, Gillian Kerr, Oliver Hopt, Larry Hoyle We reviewed the spreadsheet https://docs.google.com/spreadsheets/d/1VDbVz2KRRSX_KEf0IfuE-QqMyTDupftCZfBdBM6VPT8/edit#gid=1652443366 , sheet NewStartingPointCdbk_4 Which now has xpaths to ddi2.5 elements (column F) and description of the DDI4 classes (column E) which correspond as well as descriptions of needed DDI4 classes We discussed the creation of a view. We need two new classes one for the whole activity producing data and one to describe each wave, or phase. Issues are associated with the top level (e.g. design) but then there are specifics at each repeated instance producing different data. The general and specific shouldn’t be duplicative for one time activities. An example of the top level information would be the purpose for the whole set of activities. Another would be the funding source for the whole, or authorizing legislation. What terms could we use? --- Activity? Data capture activity? Need anchor class and specific class “anchor class” and “concrete anchor instance class”
In stats agencies ongoing activities – designs change, the overall is known by a name and has a funding source. (e.g. CPS or American community survey). The specific might be a monthly collection e.g. monthly CPS as input to calculation of unemployment rate. Another example would be the Christmas bird count which has annual data collections but can also be considered to be an overall series. Decision: “StudySeries” as overall “Study” for the specific – the user community is familiar with this term, even if developers don’t like it Conceptual would be the best current package for these classes Oliver will create classes then the rest of us can work on descriptions. Larry will add other classes to the view
Goodbye “TOFKAS” (The Object Formerly Known As Study). Even Prince went back to “Prince” |
Expand | ||
---|---|---|
| ||
Notes from Codebook Meeting 2016-04-26, 8am EDT |
...
Expand | ||
---|---|---|
| ||
Codebook meeting 2 February 2016 Attending: Dan, Michelle, Steve, Oliver, Jon, Larry, Jared There’s some lack of clarity about where this group is at. Discussed what to include in simple codebooks. One idea is to review the spreadsheet of common elements (summary of CESSDA) and build on that. Essentials seem to include: enough information to read the data into statistical package, label values, understand universe, understand what measure means so you can interpret the data, attribution information. Another idea is to look at examples of simple codebooks, identify what they use, and then map to a model. We need to be careful to keep things simple. Even older versions of DDI 2 weren’t exactly simple. If we nail down definitions, then do we make instances of previous versions incompatible? As we define what information elements we want in DDI 4.0, we can specify which element you want in 2 if you’re going backwards. Next steps:
|
...
Expand | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||
Simple Codebook MeetingMarch 16, 2015Present: Dan Gillman, Oliver Hopt, Larry Hoyle, Mary Vardigan The agenda for the meeting was to determine if all elements in the CESSDA profile/Nesstar profile are present in DDI 4. Larry Hoyle had created a spreadsheet of DDI Lite and the list of elements from CESSDA profiles. There seems to be a wide variety of the selection of the elements and attributes in the repositories using DDI Lite. The Nesstar Webview comes as the base. The group compared elements used across different repositories. The task was to find out which elements are in DDI4, so the group decided to divide up the list of 200+ elements. There appears not to be any DDI4 elements about the metadata itself, the DDI document. It basically parallels the study description information. This may not be relevant for DDI4. Perhaps the Data Citation group should think about this. This is often the archive's intellectual property, so some representation of it will be of interest to most of the archives. Citing the user guide or documentation is a common practice. DDI Codebook has some elements of description that DDI4 has not been talking about. We need to bring forth something to the Advisory Group about this – this is an issue that we need to discuss. In DDI Lifecycle there is the corresponding instance with a citation on it. There is no DDI4 instance because instance is a root element for documents in general. Will the idea of a document description disappear in 4? The archive creates a document describing the data. The landing page is sometimes (always?) metadata. Study level, variable level, record level, file level: should the Data Citation group look at what are targets of citation? In DDI Codebook, we have DocumentDescription; in DDI Lifecycle we have DDIInstance. Should DDIInstance be brought back into DDI4? – with revised content but allowing attachment of annotation. Being able to point to an XML file with the model and generate that file from elements in 4 is adequate. But it is no longer enough to point to one object that contains everything. We have the logical vs. physical distinction. A DDIInstance as a physical thing – something that's there. Pulling together the information into that representation is an activity with Authors, etc. There is the "same" content in two archives. – different contact people, different URIs for each. This is parallel to data description. Assignments for the next meetingWhere in DDI4 do each of these elements exist?
|
Expand | ||
---|---|---|
| ||
Simple Codebook Meeting March 2, 2015Present: Michelle Edwards, Dan Gillman, Oliver Hopt, Larry Hoyle, Steve McEachern, Mary Vardigan The group welcomed Michelle Edwards of CISER. The chair noted that this group is in a sense waiting for other groups (Discovery, Data Description, Instrument) to complete what they are doing so that we can finish our work. We recognize a need to incorporate both Codebook and Lifecycle into one spec (DDI 4), so we have been exploring that in our group a bit. DDI Lite was reviewed and compared with the element sets that ICPSR, GESIS, and IHSN use and they are a fairly good match. We won't be able to exactly duplicate Codebook and Lifecycle as views of DDI 4 but we can get close. Organizations that have invested in 3.2 do not want to lose that investment. Can we map 3.2 to 4 by automatically importing what's in 3.2? We may need a conversation with Guillaume about this. This should probably be at the Advisory Group level. DDI Codebook and Lifecycle have different names for the same element. We will need mappings for people. What we write out is also important. Interoperability can be defined in terms of reading and writing out of a system. If we can read 2.5 into 4, we are able to ingest anything that occurs anywhere under 2.5. We want to be able to write an instance that contains all the semantic content of Codebook. If we know that there is an equivalence we should have a 2.5 writer to write it out in that name. It is the structure and the mappings that matter. There were changes between Codebook and Lifecycle that were not necessarily clean because of the use of things by reference in 3 (categories and codes). Upward compatibility may be tougher than downward compatibility. We should probably not worry about 3 here but concern ourselves with mapping 2.5 into 4. Is Codebook still an aggregation of Discovery, Description, and Instrument? Right now Discovery is a stripped down element set. We could start with 2.5 as a starting point and we need to be able to account for this. Then we could look at 4 and ask whether everything is covered. Can we restrict this to 2.5 Lite? Generally, yes. A Codebook view would be intended for an audience that is creating or managing codebooks and it doesn't matter what things are in other views or packages. Views can overlap as much as you want. DDI Lite is a view. DDI 2.5 is a view. We are leveraging the experience of repositories (ICPSR, GESIS, IHSN) in serving up data, so that makes a good codebook. It makes sense to rely on DDI Lite, which we know is used. The group reviewed the elements in DDI Lite. ADA uses a few other elements like deposit date, alternative title, collection situation, etc.. ADA uses the default Nesstar template which is close to DDI Lite. We should look at Nesstar also. The CESSDA Profile would be the best thing to use. We need to identify where things are already defined in 4 and where things still need to be defined in 4. We need to know what is missing from 4 in order to have a sense of where we stand. Our group could then go to the AG to say what needs to be addressed in sprints. If we have something in 4 that maps to Nesstar/CESSDA profile, that allows a big chunk of DDI users to adopt 4. There is another migration path we can look at: we have 2.5 codebook - is there a more modern one? Migrate 2.5 to something different? This may be out of scope for our group but we should discuss it.
|
...
Expand | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||
Simple Codebook Meeting |
Type of information | Basic Codebook | Survey | Fauna (Wildlife) |
Data structure: · Record type · Record layout · Record relationship · Data type · Valid values · Invalid values | Structured metadata to support access | Structured metadata to support access | Structured metadata to support access |
Data source: · Why was data collected · How was data collected · Who collected the data · The universe or population and how it was identified and selected | Descriptive to support assessment of quality and fitness-for-use | Purpose of the survey; Survey content and flow (may or may not need to be actionable); identification and sampling of survey population (may or may not need to be actionable for replication purposes) | Purpose of study, how data was collected (may need to be actionable to support replication and/or calibration); identification and sampling of survey population (may or may not need to be actionable for replication purposes) |
Data processing: · Data capture process · Validation · Quality control · Normalizing, coding, derivations · Protection (confidentiality, suppression, interpolation, embargo, etc.) | Informational material; support provenance | May need structured metadata for purposes of replication; Include processes, background information, proposed, actual, and implications for data | May need structured to support mechanical capture instruments, calibrations, situational variants, etc. |
Discovery information: · Who · What · When · Why · Coverage o Topical o Temporal o Spatial | Structured metadata to support discovery and access to the data as a whole | Structured metadata to support discovery and access to the data as a whole | Structured metadata to support discovery and access to the data as a whole |
Conceptual basis · Object · Concept | Informational material | Structured to support analysis of change over time and relationship between studies. May just be descriptive / informational. | Structured to support genre level comparison (heavy use of common taxonomies, etc.) |
Methodologies employed | Informational material | Structured to support replication and comparison between studies | Structured to support replication and comparison between studies |
Related materials of relevance to data | Informational material |
Definitions
Data Dictionary
· A data dictionary, or metadata repository, as defined in the IBM Dictionary of Computing, is a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format."[1] The term can have one of several closely related meanings pertaining to databases and database management systems (DBMS):
· A document describing a database or collection of databases
· An integral component of a DBMS that is required to determine its structure
· A piece of middleware that extends or supplants the native data dictionary of a DBMS
· Database about a database. A data dictionary defines the structure of the database itself (not that of the data held in the database) and is used in control and maintenance of large databases. Among other items of information, it records (1) what data is stored, (2) name, description, and characteristics of each data element, (3) types of relationships between data elements, (4) access rights and frequency of access. Also called system dictionary when used in the context of a system design.Read more: http://www.businessdictionary.com/definition/data-dictionary.html#ixzz3Am5wCgZI
· A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them. (Posted by Margaret Rouse @ WhatIs.com)
Codebook
What is a codebook? (http://www.sscnet.ucla.edu/issr/da/tutor/tutcode.htm)
A codebook describes and documents the questions asked or items collected in a survey. Codebooks and study documentation will provide you with crucial details to help you decide whether or not a particular data collection will be useful in your research. The codebook will describe the subject of the survey or data collection, the sample and how it was constructed, and how the data were coded, entered, and processed. The questionnaire or survey instrument will be included along with a description or layout of how the data file is organized. Some codebooks are available electronically, and you can read them on your computer screen, download them to your machine, or print them out. Others are not electronic and must be used in a library or archive, or, depending on copyright, photocopied if you want your own for personal use.
Codebook : Lisa Carley-Baxter (http://srmo.sagepub.com/view/encyclopedia-of-survey-research-methods/n69.xml)
Codebooks are used by survey researchers to serve two main purposes: to provide a guide for coding responses and to serve as documentation of the layout and code definitions of a data file. Data files usually contain one line for each observation, such as a record or person (also called a "respondent"). Each column generally represents a single variable; however, one variable may span several columns. At the most basic level, a codebook describes the layout of the data in the data file and describes what the data codes mean. Codebooks are used to document the values associated with the answer options for a given survey question. Each answer category is given a unique numeric value, and these unique numeric values are then used by researchers in their analysis of the ...
Codebook (Wikipedia.com)
A codebook is a type of document used for gathering and storing codes. Originally codebooks were often literally books, but today codebook is a byword for the complete record of a series of codes, regardless of physical format.
ICPSR
What is a codebook?
A codebook provides information on the structure, contents, and layout of a data file. Users are strongly encouraged to look at the codebook of a study before downloading the datafiles.
While codebooks vary widely in quality and amount of information given, a typical codebook includes:
• Column locations and widths for each variable
• Definitions of different record types
• Response codes for each variable
• Codes used to indicate nonresponse and missing data
• Exact questions and skip patterns used in a survey
• Other indications of the content and characteristics of each variable
Additionally, codebooks may also contain:
• Frequencies of response
• Survey objectives
• Concept definitions
• A description of the survey design and methodology
• A copy of the survey questionnaire (if applicable)
• Information on data collection, data processing, and data quality
...