Codebook High Level Document

The following content will be made available as part of an HTML document including the field level documentation or as a separate document. Please consider the following when commenting on the content of this document:

  • Publication as a part of the field level document or as a separate document

  • Clarity of the content included in this document

  • Additional content to be added prior to publication

Intended Use of Codebook

Documentation of a simple study. Basic descriptive content for variables, files, source material, and study level information. Supports discovery, preservation, and the informed use of data. 

Codebook version 2.6 - Tree structure

Numbering locations are valid for version 2.6 only and reflect the current order and nesting of this version
Field Name (min-max) [attributes]
Heavily repeated attributes have not been included here
--All elements support a basic set of attributes including: ID, xml:lang, source, elementVersion, elementVersionDate, DDILifecycleUrn, DDICodebookUrn
--See High Level documentation for details on sets of string element types and their specific attributes

0 codeBook [version, codeBookAgency]

1.1 citation (0-1) [MARCURI]
1.1.2 titlStmt (1-1)
1.1.2.1 titl (1-1)
1.1.2.2 subTitl (0-n)
1.1.2.3 altTitl (0-n)
1.1.2.4 parTitl (0-n)
1.1.2.5 IDNo (0-n) [agency, isPersistentIdentifier]
1.1.3 rspStmt (0-1)
1.1.3.1 AuthEnty (0-n) [affiliation, abbr, personalID, typeOfPersonalID]
1.1.3.2 othId (0-n) [type, role, abbr, affiliation, personalID, typeOfPersonalID]
1.1.4 prodStmt (0-1)
1.1.4.1 language (0-n) [typeOfLanguageCode, languageCode]
1.1.4.2 producer (0-n) [abbr, affiliation, role, personalID, typeOfPersonalID]
1.1.4.3 copyright (0-n)
1.1.4.4 license (0-n) [URI, type]
1.1.4.5 prodDate (0-n)
1.1.4.6 prodPlac (0-n)
1.1.4.7 software (0-n) [version]
1.1.4.8 fundAg (0-n) [affiliation, abbr, role, personalID, typeOfPersonalID]
1.1.4.9 grantNo (0-n) [agency, role]
1.1.5 distStmt (0-1)
1.1.5.1 distrbtr (0-n) [abbr, affiliation, URI, personalID, typeOfPersonalID]
1.1.5.2 contact (0-n) [required, formNo, URI]
1.1.5.3 depositr (0-n) [abbr, affiliation, personalID, typeOfPersonalID]
1.1.5.4 depDate (0-n)
1.1.5.5 distDate (0-n)
1.1.6 serStmt (0-n) [URI]
1.1.6.1 serName (0-n) [abbr]
1.1.6.2 serInfo (0-n)
1.1.7 verStmt (0-n)
1.1.7.1 version (0-n) [type]
1.1.7.2 verResp (0-n) [affiliation, personalID, typeOfPersonalID]
1.1.7.3 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
1.1.9 biblCit (0-n) [format]
1.1.10 holdings (0-n) [location, callno, URI, media]
1.2 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
1.3 guide (0-n)
1.4 docStatus (0-n)
1.5 docSrc (0-n) --SEE Citation Contents section 1.1 inclusive--
1.5.1 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
1.6 controlledVocabUsed (0-n)
1.6.1 codeListID (0-1)
1.6.2 codeListName (0-1)
1.6.3 codeListAgencyName (0-1)
1.6.4 codeListVersionID (0-1)
1.6.5 codeListURN (0-1)
1.6.6 codeListSchemeURN (0-1)
1.6.7 usage (1-n)
1.6.7.1 selector (0-1)
1.6.7.2 specificElements (0-1) [refs, authorizedCodeValue]
1.6.7.3 attribute (0-1)
1.7 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]

2.1 citation (1-n) --SEE Citation Contents section 1.1 inclusive--
2.1.1 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.2 studyAuthorization (0-n) [date]
2.2.1 authorizingAgency (0-n) [affiliation, abbr, personalID, typeOfPersonalID]
2.2.2 authorizationStatement (0-n)
2.3 stdyInfo (0-n)
2.3.1 studyBudget (0-n)
2.3.2 subject (0-n)
2.3.2.1 keyword (0-n)
2.3.2.2 topcClas (0-n)
2.3.3 abstract (0-n) [contentType]
2.3.4 sumDscr (0-n)
2.3.4.1 timePrd (0-n) [event, cycle]
2.3.4.2 collDate (0-n) [event, cycle]
2.3.4.3 nation (0-n) [vocab, vocabURI, vocabInstanceURI]
2.3.4.4 geogCover (0-n)
2.3.4.5 geogUnit (0-n)
2.3.4.6 geoBndBox (0-1)
2.3.4.6.1 westBL (1-1)
2.3.4.6.2 eastBL (1-1)
2.3.4.6.3 southBL (1-1)
2.3.4.6.4 northBL (1-1)
2.3.4.7 boundPoly (0-n)
2.3.4.7.1 polygon (1-n)
2.3.4.7.1.2 point (1-n)
2.3.4.7.1.2.1 gringLat (1-1)
2.3.4.7.1.2.2 gringLon (1-1)
2.3.4.8 anlyUnit (0-n) [unit]
2.3.4.9 universe (0-n) [level, clusion]
2.3.4.10 dataKind (0-n) [type]
2.3.4.11 generalDataFormat (0-n)
2.3.5 qualityStatement (0-1)
2.3.5.1 standardsCompliance (0-n)
2.3.5.1.1 standard (1-1)
2.3.5.1.2 standardName (0-n) [date, version, URI]
2.3.5.1.3 producer (0-n) [abbr, affiliation, role, personalID, typeOfPersonalID]
2.3.5.2 complianceDescription (0-n)
2.3.5.2 otherQualityStatement (0-1)
2.3.6 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.3.7 exPostEvaluation (0-n) [completionDate, type]
2.3.7.1 typeOfExPostEvaluation (0-n)
2.3.7.2 evaluator (0-n) [affiliation, abbr, role, personalID, typeOfPersonalID]
2.3.7.3 evaluationProcess (0-n)
2.3.7.4 outcomes (0-n)
2.4 studyDevelopment (0-n)
2.4.1 developmentActivity (0-n) [type]
2.4.1.1 typeOfDevelopmentActivity (0-n)
2.4.1.2 description (0-n)
2.4.1.3 participant (0-n) [affiliation, abbr, role, personalID, typeOfPersonalID]
2.4.1.4 resource (0-n)
2.4.1.4.1 typeOfDataSrc (0-1)
2.4.1.4.2 dataSrc (0-n)
2.4.1.4.3 srcOrig (0-n)
2.4.1.4.4 srcChar (0-n)
2.4.1.4.5 srcDocu (0-n)
2.4.1.5 outcome (0-n)
2.5 method (0-n)
2.5.1 dataColl (0-n)
2.5.1.1 timeMeth (0-n) [method]
2.5.1.2 dataCollector (0-n) [abbr, affiliation, role, personalID, typeOfPersonalID]
2.5.1.3 collectorTraining (0-n) [type]
2.5.1.4 frequenc (0-n) [freq]
2.5.1.5 sampProc (0-n)
2.5.1.6 sampleFrame (0-n)
2.5.1.6.1 sampleFrameName (0-n)
2.5.1.6.2 labl (0-n) [level, vendor, country, sdatrefs]
2.5.1.6.3 txt (0-n) [level, sdatrefs]
2.5.1.6.4 validPeriod (0-n) [event]
2.5.1.6.5 custodian (0-n) [affiliation, abbr, role, personalID, typeOfPersonalID]
2.5.1.6.6 useStmt (0-n)
2.5.1.6.6.1 confDec (0-n) [required, formNo, URI]
2.5.1.6.6.2 specPerm (0-n) [required, formNo]
2.5.1.6.6.3 restrctn (0-n)
2.5.1.6.6.4 contact (0-n) [required, formNo, URI]
2.5.1.6.6.5 citReq (0-n)
2.5.1.6.6.5 deposReq (0-n)
2.5.1.6.6.6 conditions (0-n)
2.5.1.6.6.7 disclaimer (0-n)
2.5.1.6.7 universe (0-n) [level, clusion]
2.5.1.6.8 frameUnit (0-n) [isPrimary]
2.5.1.6.8.1 unitType (1-1) [numberOfUnits]
2.5.1.6.8.2 txt (0-n) [level, sdatrefs]
2.5.1.6.9 referencePeriod (0-n)
2.5.1.6.10 updateProcedure (0-n)
2.5.1.7 targetSampleSize (0-n)
2.5.1.7.1 sampleSize (0-1)
2.5.1.7.2 sampleSizeFormula (0-n)
2.5.1.8 deviat (0-n)
2.5.1.9 collMode (0-n)
2.5.1.10 resInstru (0-n) [type]
2.5.1.11 instrumentDevelopment (0-n) [type]
2.5.1.12 sources (0-1) [typeOfDataSrc, dataSrc, sourceCitation, srcOrig, srcChar, srcDocu, sources]
2.5.1.12.1 typeOfDataSrc (0-1)
2.5.1.12.2 dataSrc (0-n)
2.5.1.12.3 sourceCitation (0-n)
2.5.1.12.4 srcOrig (0-n)
2.5.1.12.5 srcChar (0-n)
2.5.1.12.6 srcDocu (0-n)
2.5.1.12.7 sources (0-n) --RECURSIVE--
2.5.1.13 collSitu-0-n)
2.5.1.14 actMin (0-n)
2.5.1.15 ConOps (0-n) [agency]
2.5.1.16 weight (0-n)
2.5.1.17 cleanOps-0-n) [agency]
2.5.2 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.5.3 anlyInfo (0-1)
2.5.3.1 respRate (0-n)
2.5.3.2 EstSmpErr (0-n)
2.5.3.1 dataAppr (0-n) [type]
2.5.4 stdyClas (0-n) [type]
2.5.5 dataProcessing (0-n) [type]
2.5.6 codingInstructions (0-n) [type, relatedProcesses]
2.5.6.1 typeOfCodingInstruction (0-n)
2.5.6.2 txt (0-n) [level, sdatrefs]
2.5.6.3 command (0-n) [formalLanguage]
2.6 dataAccs (0-n)
2.6.1 typeOfAccess (0-1)
2.6.2 setAvail (0-n) [media, callno, label, type]
2.6.2.1 typeOfSetAvailability (0-n)
2.6.2.2 accsPlac (0-n)
2.6.2.3 origArch (0-n) [affiliation, abbr, URI, personalID, typeOfPersonalID]
2.6.2.4 avlStatus (0-n)
2.6.2.5 collSize (0-n)
2.6.2.6 complete (0-n)
2.6.2.7 fileQnty (0-n)
2.6.2.8 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.6.3 useStmt (0-n)
2.6.3.1 confDec (0-n) [required, formNo, URI]
2.6.3.2 specPerm (0-n) [required, formNo]
2.6.3.3 restrctn (0-n)
2.6.3.4 contact (0-n) [required, formNo, URI]
2.6.3.5 citReq (0-n)
2.6.3.6 deposReq (0-n)
2.6.3.7 conditions (0-n)
2.6.3.8 disclaimer (0-n)
2.6.4 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.7 metadataAccs (0-n)
2.7.1 typeOfAccess (0-1)
2.7.2 useStmt (0-n)
2.7.2.1 confDec (0-n) [required, formNo, URI]
2.7.2.2 specPerm (0-n) [required, formNo]
2.7.2.3 restrctn (0-n)
2.7.2.4 contact (0-n) [required, formNo, URI]
2.7.2.5 citReq (0-n)
2.7.2.6 deposReq (0-n)
2.7.2.7 conditions (0-n)
2.7.2.8 disclaimer (0-n)
2.7.3 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
2.8 othrStdyMat (0-n)
2.8.1 relMat (0-n) [callno, label, media, type]
2.8.2 relStdy (0-n)
2.8.3 relPubl (0-n)
2.8.4 othRefs (0-n)
2.8.4.1 citation (1-1) --SEE Citation Contents section 1.1 inclusive--
2.9 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]

3.1 fileTxt (0-n)
3.1.1 fileName (0-n)
3.1.2 fileCitation (0-1) --SEE Citation Contents section 1.1 inclusive--
3.1.3 dataFingerprint (0-n) [type]
3.1.3.1 digitalFingerprintValue (1-1)
3.1.3.2 algorithmSpecification (0-1)
3.1.3.3 algorithmVersion (0-1)
3.1.4 fileCont (0-1)
3.1.5 fileStrc (0-1) [type, otherType, fileStrcRef]
3.1.5.1 recGrp (0-n) [recGrp, rectype, keyvar, rtypeloc, rtypewidth, rtypevtype, recidvar]
3.1.5.1.1 labl (0-n) [level, vendor, country, sdatrefs]
3.1.5.1.2 recDimnsn (0-1) [level]
3.1.5.1.2.1 varQnty (0-1)
3.1.5.1.2.2 caseQnty (0-1)
3.1.5.1.2.3 logRecL (0-1)
3.1.5.2 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
3.1.6 dimensns (0-1)
3.1.6.1 caseQnty (0-n)
3.1.6.2 varQnty (0-n)
3.1.6.3 logRecL (0-n)
3.1.6.4 recPrCas (0-n)
3.1.6.4 recNumTot (0-n)
3.1.7 fileType (0-n) [charset]
3.1.8 format (0-n)
3.1.9 filePlac (0-n)
3.1.10 dataChck (0-n)
3.1.11 ProcStat (0-n)
3.1.12 dataMsng (0-n)
3.1.13 software (0-n) [version]
3.1.14 verStmt (0-n)
3.2 fileDerivation (0-1)
3.2.1 fileCommand (0-n) [fileDerivationCasesType]
3.2.1.1 drvdesc (0-1)
3.2.1.2 drvcmd (1-n) [syntax]
3.2.1.3 fileDerivationVars (0-n)
3.2.1.3.1 keep (0-1)
3.2.1.3.2 drop (0-1)
3.2.1.3.3 add (0-1)
3.3 locMap (0-1)
3.3.1 dataItem (0-n) [varRef, nCubeRef, access]
3.3.1.1 CubeCoord (0-n) [refs, authorizedCodeValue]
3.3.1.2 physLoc (0-n) [type, recRef, startPos, width, endPos]
3.4 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]

4.1 varGrp (0-n) [type, otherType, var, varGrp, name, sdatrefs, methrefs, pubrefs, access, nCube]
4.1.1 labl (0-n) [level, vendor, country, sdatrefs]
4.1.2 txt (0-n) [level, sdatrefs]
4.1.3 concept (0-n) [vocab, vocabURI, vocabInstanceURI]
4.1.4 defntn (0-n)
4.1.5 universe (0-n) [level, clusion]
4.1.6 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.2 nCubeGrp (0-n) [name, sdatrefs, methrefs, pubrefs, access, dmnsQnty, cellQnty]
4.2.1 labl (0-n) [level, vendor, country, sdatrefs]
4.2.2 txt (0-n) [level, sdatrefs]
4.2.3 concept (0-n) [vocab, vocabURI, vocabInstanceURI]
4.2.4 defntn (0-n)
4.2.5 universe (0-n) [level, clusion]
4.2.6 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.3 var (0-n) [name, wgt, wgt-var, weight, qstn, files, vendor, dcml, intrvl, rectype, sdatrefs, methrefs, pubrefs, access, aggrMeth, otherAggrMethmeasUnit, scale, origin, nature, additivity, otherAdditivity, temporal, geog, geoVocab, catQnty, representationType, otherRepresentationType]
4.3.1 location (0-n) [StartPos, EndPos, width, RecSegNo, fileid, locMap]
4.3.2 labl (0-n) [level, vendor, country, sdatrefs]
4.3.3 imputation (0-n)
4.3.4 security (0-n)
4.3.5.5 embargo (0-n) [event, format]
4.3.6 respUnit (0-n)
4.3.7 anlysUnit (0-n)
4.3.8 qstn (0-n) [qstn, var, seqNo, sdatrefs, responseDomainType, otherResponseDomainType]
4.3.8.1 preQTxt (1-1)
4.3.8.2 qstnLit (1-1) [sdatrefs]
4.3.8.3 postQTxt (1-1)
4.3.8.4 forward (1-1) [qstn]
4.3.8.5 backward (1-1) [qstn]
4.3.8.6 ivuInstr (1-1)
4.3.9 valrng (0-n) [access]
4.3.9.1 item (0-n) [UNITS, VALUE]
4.3.9.2 range (0-n) [UNITS, min, minExclusive, max, maxExclusive]
4.3.9.3 key (0-n)
4.3.9.4 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.3.10 invalrng (0-n) [access]
4.3.10.1 item (0-n) [UNITS, VALUE]
4.3.10.2 range (0-n) [UNITS, min, minExclusive, max, maxExclusive]
4.3.10.3 key (0-n)
4.3.10.4 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.3.11 undocCod (0-n)
4.3.12 universe (0-n) [level, clusion]
4.3.13 TotlResp (0-n)
4.3.14 sumStat (0-n) [wgtd, wgt-var, weight, type, access, otherType]
4.3.15 txt (0-n) [level, sdatrefs]
4.3.16 stdCatgry (0-n) [URI, access]
4.3.17 catgryGrp (0-n) [missing, missType, catgry, catGrp, levelno, levelnm, compl, excls]
4.3.17.1 labl (0-n) [level, vendor, country, sdatrefs]
4.3.17.2 catStat (0-n) [type, otherType, URI, methrefs, wgtd, wgt-var, weight, sdatrefs, access]
4.3.17.3 txt (0-n) [level, sdatrefs]
4.3.18 catgry (0-n) [missing, missType, country, sdatrefs, access, excls, catgry, level]
4.3.18.1 catValu (0-1)
4.3.18.2 labl (0-n) [level, vendor, country, sdatrefs]
4.3.18.3 txt (0-n) [level, sdatrefs]
4.3.18.4 catStat (0-n) [type, otherType, URI, methrefs, wgtd, wgt-var, weight, sdatrefs, access]
4.3.18.5 mrow (0-1)
4.3.18.5.1 mi (0-n) [varRef]
4.3.19 codInstr (0-n)
4.3.20 verStmt (0-n)
4.3.21 concept (0-n) [vocab, vocabURI, vocabInstanceURI]
4.3.22 derivation (0-1) [var]
4.3.22.1 varRange (0-n) [start, end]
4.3.22.2 drvdesc (0-n)
4.3.22.3 drvcmd (0-n) [syntax]
4.3.23 varFormat (0-1) [type, formatname, schema, otherSchema, category, otherCategory, URI]
4.3.24 geoMap (0-n) [URI, mapformat, levelno]
4.3.25 catLevel (0-n) [levelnm, geoMap]
4.3.26 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.4 nCube (0-n) [name, sdatrefs, methrefs, pubrefs, access, dmnsQnty, cellQnty]
4.4.1 location (0-n) [StartPos, EndPos, width, RecSegNo, fileid, locMap]
4.4.2 labl (0-n) [level, vendor, country, sdatrefs]
4.4.3 txt (0-n) [level, sdatrefs]
4.4.4 universe (0-n) [level, clusion]
4.4.5 imputation (0-n)
4.4.6 security (0-n)
4.4.7 embargo (0-n) [event, format]
4.4.8 respUnit (0-n)
4.4.9 anlysUnit (0-n)
4.4.10 verStmt (0-n)
4.4.11 purpose (0-n) [abbr, affiliation, role, personalID, typeOfPersonalID]
4.4.12 dmns (0-n) [rank, varRef]
4.4.12.1 cohort (0-n) [catRef, value]
4.4.12.1.1 range (0-n) [UNITS, min, minExclusive, max, maxExclusive]
4.4.13 measure (0-n) [varRef, aggrMeth, otherAggrMeth, measUnit, scale, origin, additivity]
4.4.14 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
4.5 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]

5.1 typeOfSetAvailability (0-n)
5.2 labl (0-n) [level, vendor, country, sdatrefs]
5.3 txt (0-n) [level, sdatrefs]
5.4 notes (0-n) [type, subject, level, resp, sdatrefs, parent, sameNote]
5.5 table (0-n)
5.6 citation (0-1) --SEE Citation Contents section 1.1 inclusive--
5.7 otherMat (0-n) --RECURSIVE--

Deprecated Content

xml-lang

DO NOT USE THIS ATTRIBUTE. Its inclusion is an error that was persisted to retain backward compatibility. If this attribute has been used, transfer the content to xml:lang.

Link

This element permits encoders to provide links from any arbitrary element containing Link as a subelement to other elements in the codebook. The use of this element has been deprecated and the use of provided object references such as varRefs, sdatrefs, methrefs, and pubrefs is recommended. Internal references within texts in structured content can be done with xhml options.

ExtLink

This element permits encoders to provide links from any arbitrary element containing ExtLink as a subelement to electronic resources outside the codebook. The use of ths element has be deprecated and the use of various othrMat types is recommended. A parent element can frequently use sdatrefs, methrefs, or pubrefs to refer to the appropriate other material type with which can hold the title, description, and URI for the external source.

Indentification

DDI Codebook approaches identification in two ways; internal identification and unique identification. This reflects the historic usage of XML defined xs:ID in the original DTD form of the standard, and the later development of the DDI URN structure.

The following group of identification related attributes are available on most Codebook elements:

Field Name

Type

Content

Field Name

Type

Content

ID

xs:ID

An ID for internal use according to XML. This is the ID to enter when referencing using IDRef or IDRefs

elementVersion

xs:string

The version number of the element

elementVersionDate

dateSimpleType

The date associated with the version number using a simple ISO structured date yyyy-mm-dd

ddiLifecycleUrn

xs:anyURI

The DDI structured URN of the related Lifecycle element. Use when transforming metadata from or to DDI Lifecycle

ddiCodebookUrn

xs:anyURI

The DDI structured URN of this element (to support external references to this element)

Internal identification for the purpose of referencing within the XML instance

The ID is defined as xs:ID (type xs:NCNAME) and must be unique with the instance. An ID cannot begin with a integer and cannot contain a colon {:}.
Standard XML validators will check that each xs:ID is unique and that each xs:IDREF or xs:IDREFS references an existing xs:ID within the instance.

The two attributes elementVersion and elementVersionDate allow you to provide a specific version number and/or version date. Note that only one version of an element should be included in the instance as an XML validation will not differentiate between multiple version numbers and will treat this as a non-unique xs:ID.

Unique identification to support external referencing

The attribute ddiCodebookUrn allows you to provide a DDI structured unique identifer for the Codebook element using the standard DDI URN structure (urn:ddi:[agency identifier]:[element identifier]:[version number]. This should be used if your intention is to provide access to the element from outside the instance.

The attribute ddiLifecycleUrn is provided for the purpose of translation between the two DDI products, Codebook and Lifecycle. Indicate the the DDI URN of the Lifecyle element that served as the basis for the Codebook element.

Support for Controlled Vocabularies

When initailly created, Codebook added a number of attributes to descriptive text fields that were intended to support future controlled vocabularies. It was thought at the time that controlled vocabularies would be listed internally as simple enumerations. As the technology changed and XML developed into the standardized us of schemas, the use of external controlled vocabularies became the dominant use. To support the use of external controlled vocabularies new structures were needed. The "concept" is the standardized means of providing the information needed to provide access to and validation of the use of external controlled vocabularies. Version 2.5 provided a means of noting the external vocabulary used at any point in the Codebook. Version 2.6 adds the ability to provide a direct link to the term within a controlled vocabulary used at a specific point. The standard form for controlled vocabularies identified within Codebook are based on the "concept" element. This may be expressed by an element of type="conceptType" or through the use of a "conceptualTextType" which includes the option for using a "concept" in conjunction with descriptive text. This has resulted in a number of options for expressing the use of external controlled vocabularies.

Best Practices:

Existing documentation making use of various attributes intended to contain controlled vocabularies should be treated as terms with unspecified controlled vocabulary usage. If the controlled vocabulary is known, and the element supports the use of "concept", replicate the value in the attribute in "concept" and add the information identifying the controlled vocabulary used.

New documentation should use the "concept" option rather than the original attribute. If attributes (generally a “type” attribute) were used in the past, these were not connected to external controlled vocabularies and still lack that capability. If they are from a controlled vocabulary the information may be repeated in the new “concept” option. Note that some search systems may be using the attribute as a search term. If so, it is best to use both the “concept” field to capture full information and replicate the value in the attribute to support systems currently in use. The concept field can also be used in conjunction with “otherValue” when declaring an alternate value with an internal vocabulary.

New documentation wishing to use controlled vocabularies where the "concept" option is not available should use the "controlledVocabulary" structure and associate it with the element or attribute containing the value of the controlled vocabulary.

Concepts extensions of simple string types and are repeatable. Multilanguage documents should enter a concept for each language as needed.

Example is for the controlled vocabulary term “SF: 311-312 draft horses” which is contained in the string portion of the element.

Attribute

Content

Example

Attribute

Content

Example

vocab

xs:string

LCSH

vocabURI

xs:string

http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html

vocabInstanceURI

xs:string

http://lcweb.loc.gov/catdir/cpso/lcco#SF311-312

The string type named “conceptualTextType” is a mixed content type and may include a “concept” element within its text string. These are found in the following locations.

Common Structural Types

String Type

Simple String Type

Conceptual Text Type

Table and Text Type

Simple Text and Date Type

Integer Type

Material Reference Type

Geographic Information

The elements that support geographic information are primarily found in the Summary Description. These include:

Geographic Bounding Box (geoBoundBox)

Geographic Unit (geoUnit)

Geographic Coverage (geoCover)

This section provides best practices in the use of these fields to provide detailed information on geographic features in a consistent and managable way. The fields to describe geography have been available since version 2.1.

Geographic Bounding Box

A bounding box is a listing of the North and South Latitudes and East and West Longitudes. It is used by spatial search systems using a geographic point. It is a quick, easily resolvable means of determining whether or not the point falls within the area covered by the dataset.

Note that the non-repeatable geoBoundBox is located within the repeatable Summary Description (sumDscr). The bounding box should only be provided once and cover the full composite area described by one or more sumDscr geographies. It should be in a broad sumDscr with additional sumDscrs involving time nation pairs if needed. If additional bounding box information is desired to identify sub-areas the Bounding Polygon option should be used as this can be expressed as a minimal area of a bounding box. The use of multiple geoBoundBox fields are ignored by most spatial search systems and may cause confusion and an inaccurate search result.

Associating Geographic Locations with Specific Time Periods

Replication of sumDscr is useful when bundling specifics like timePrd, nation, and universe for specific samples within a larger project. A clear example of this is description for the various IPUMS project that harmonize multiple samples of census, health, and related data. IPUMS identified the coverage of individual samples within a project using a combination of these three elements to be able to differentiate between samples:

<sumDscr>
<timeProd date="2014">2014</timeProd>
<nation>Burkino Faso</nation>
<universe>Women</universe>
</sumDscr>
<sumDscr>
<timeProd date="2014">2014</timeProd>
<nation>Burkino Faso</nation>
<universe>Children</universe>
</sumDscr>
<sumDscr>
<timeProd date="2018">2018</timeProd>
<nation>Burkino Faso</nation>
<universe>Women</universe>
</sumDscr>
<sumDscr>
<timeProd date="2018">2018</timeProd>
<nation>India</nation>
<universe>Women</universe>
</sumDscr>

Geographic Hierarchies

Both geogCover and geogUnit are conceptual types and can therefore reference geographic statistical classifications (geographic area type as well as geographic locations) at multiple levels. When defining hierarchies specify a location and the parent geographic area.

<nation abbr=”us”>United States<nation>

<geoCover><concept vocabURI etc.>State/Province</concept>California</geoCover>

<geoCover><concept vocabURI etc.>City</concept>San Jose</geoCover>

<geoCover><concept vocabURI etc.>Other</concept>Silicon Valley</geoCover>

Repeat sumDscr for each set.

NCubes

Description

NCube is the DDI structure to describe dimensional data; cross-tabulations, tabular data, aggregations, etc. The nCube describes the structure which provides a label, universe, measure(s)types, dimensions, and attributes of the table as well as the relationship between the cells of table. Note that a table can be 1 to n dimensional. NCubes use variables to define each dimension of the table. Each cell in the table intersects with each dimension at one and only value. Note that some visual "tables" may be composed of multiple NCubes with a common dimension. This is generally done for display purposes and/or for consolidating printing space. The table below is composed of two nCubes; Age by Sex and Poverty by Sex.

 

Age < 18 years

Age 18 and over

Below Poverty

At or Above Poverty

 

Age < 18 years

Age 18 and over

Below Poverty

At or Above Poverty

Male

 

 

 

 

Female

 

 

 

 

Dimension definitions

Each dimension is assigned a number and linked to the variable describing it using "dmns". The variable is described using a code list and the codes are used as the intercept value of cell with a dimension. The cell is identified with an array of the intercept value for each dimension in dimension order. This allows the user to reorganize the display of the table without changing the cell address. In the following table Age is dimension 1 and Sex is dimension 2. The cell address is presented in that order.

 

Age < 18 years

Age 18 and over

 

Age < 18 years

Age 18 and over

Male

1,1

2,1

Female

1.2

2,2

Location Map for defining location of cell contents in a data file

The location field of each nCube is expressed as locMap="IDRef" using the ID of the location map (locMap) defining the link between the nCube cell and its physical location of the stored value. The data item provides a reference to the nCube and each cube coordinate (dimension number and intercept value). Physical location is defined in the same way as used by a variable. The locMap can be used by both microdata variables and nCube cells making it a single consistent means of linking a description to a data item.