List of initial terms to keep (completed Feb 9, 2022)
updated with additions 2022-02-23
- in process of adding in new terms
put the mission statement at the top of the glossary
put a note in about going to another page to refer to softwares used to interpret DDI
therefore no softwares will be in the Glossary
we have to remember that the terms below are specifically relating to DDI
we will be referring to the original glossary and building on it
note - these descriptions are more descriptive
Archive (n.)
look in OAIS spec (https://en.wikipedia.org/wiki/Open_Archival_Information_System )
OAIS: Archive: An organization that intends to preserve information for access and use by a
Designated Community. (p. 1-9)ISO 14721, (2003, 2012)
any organization that maintains data for the long haul, including preservation (this is what it means for Lifecycle)
CASRAI
"A physical place or digital location containing curated static records and data. Set up and managed to established standards, (e.g. ISAD(G) https://www.ica.org/en/isadg-general-international-standard-archival-description-second-edition or Core Trust Seal https://www.coretrustseal.org/ that ensure long term integrity, security, authenticity and accessibility of the records and data"
draws on ICA (International Council on Archives) definition
DDI-L: A maintainable module containing information related to the archiving (long term access and/or preservation) of the data and metadata.
Codebook (for the purposes of DDI)
what is a codebook
codebook standard under DDI
DDI-C is an XML representation of a codebook
includes a lot of extra metadata about the study
questionnaire, how the study was conducted, …
conceptual variable
GSIM
CDI
DDI-L
Controlled vocabulary
how they get used in DDI
to configure the standard used
CASRAI: A list of standardized terminology, words, or phrases, used for indexing or content analysis and information retrieval, usually in a defined information domain.
Cross-Domain Integration
also known as CDI, DDI-CDI
a DDI product
a nascent standard in the DDI family
Correspondence table - see Crosswalk
GSIM
used to map similar things in collections
used in classifications specifically
Crosswalk - see Correspondence table
GSIM
Data Documentation Initiative
The Data Documentation Initiative (DDI) is a suite of open, human-readable, and machine-actionable specifications used internationally for describing the data produced with surveys and other observational methods in the social, behavioral, economic, and health domains
Data lifecycle
The stages of the data production and management process to support research and policy covering conceptualization, design, acquisition, processing, analysis, sharing, and archiving
Datum
A piece of information
[DDI-CDI definition]
[DDI Lifecycle definition]
DDI agency
DDI instance
1. The root element of a related set of DDI metadata in the DDI Lifecycle XML Schema
2. In general use, any XML instance containing DDI metadata
DDI Lifecycle Model
put a link in to the canonical model, that is the tech cttee version - have to ask them where it is!
DDI Lite
A simple subset of DDI Codebook elements with basic information describing a dataset. (See https://ddialliance.org/specification/ddi2.1/lite/index.html )
DDI Profile (i.e., Lite, etc.)
1. A selection of metadata fields conforming to the DDI Codebook, DDI Lifecycle, or DDI-CDI specification for use by a particular community or for a specific application
2. In DDI Lifecycle, a formal XML expression of the elements used by a particular community or application
[NOTE: Provide good examples]
DDI scheme
For DDI Lifecycle, a package of related metadata items of a single type (e.g., concepts, variables, categories) for the purposes of data/metadata management and reuse, owned and maintained by the DDI agency
dimensional data
Synonyms: multidimensional data, data cube, N-Cube
Data organized according to multiple axes which act as a coordinate system for identifying and describing individual datums. [?]
DISCO [this should be in the acronyms list also]
The DDI-RDF Discovery Vocabulary, which is a standard set of metadata generalized from DDI Codebook and DDI Lifecycle for supporting Web searches for data using the W3C linked data technologies. Note that DISCO is still under development.
Discovery
need a definition to be clear on DDI usage. Ability to uncover resources described using DDI metadata.? Identifying programmatically the relevant resources (datasets, studies?) for a specific research purpose. (from DDI-RDF vocabulary web page). ‘Find’ part of FAIR.
Dissemination
From Lifecycle. Focus on usage in DDI context. See paper, page 5 terminology with possible definition.
DTD
Acronym, document type definition. Document description language that largely predates XML schema. First version of Codebook was a DTD. Note that term has been superseded/is archaic. See DDI taxonomy page.
External reference – link to resource that is external to metadata instance (e.g. a vocabulary concept) (2022-01-26). needs thought…, review. need to clarify if is reference to DDI concept or external to DDI. both the DDI technical sense of reference to other DDI metadata, and a general sense of reference of non-DDI things
Genericode
http://docs.oasis-open.org/codelist/cs-genericode-1.0/doc/oasis-code-list-representation-genericode.html “OASIS Code List Representation format, “genericode”, is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information.” Need to determine if DDI codelists use this encoding, if not, remove term.
Identifiable
used in Lifecycle, a class of things that have an identifier. Motivated by need to reference or reuse some information object.
IHSN toolkit
International household Survey Network application of DDI codebook. Software tools for working with application. Term might need to be updated for label currently used, perhaps “IHSN Microdata Management Toolkit”.
Inclusion inline vs. by reference need to look at how this is presented in specs, but need clarity on external reference, Internal publication of DDI schemes; glossary should have the same term (label) that is used in the specifications. [make positive statement to effect ‘ddi lifecycle XML uses references between instances and sources of metadata to enable reuse. Publication of DDI schemes supports this functionality’]
instance variable - variable in the context of a particular dataset; define with this approach-- ‘conceptual variable is…’, ‘represented variable is conceptual variable with…', ‘instance variable is a represented variable as used in a dataset…, denotes inclusion of information about source of data (context…). ’ Inherited from GSIM. appears in DDI lifecycle and and CDI. Most granular element in variable cascade.
instrument - implementable mechanism for collecting data. Notes - typically a questionnaire or sensor;
Interoperability - as defined in the FAIR principles (with a link to GoFair Principles - https://www.go-fair.org/fair-principles/ ). several aspects: data, instruments, semantics, system, syntax. The capacity for systems (things, agents) to interact meaningfully and correctly. [System is construed broadly to include any kind of interacting agent..]
key-value data (datastore, structure) - data in which each value is associated with an identifying field (string). Add note that identifying field (key) is considered unitary; key does not imply any internal structure
Lifecycle - see above (that is, Data lifecycle, DDI lifecycle, Survey lifecycle per GSBPM)
Logical record - the schema for the content of an information item (record), in contrast to physical record. Tells what is in record, how they are related. Physical record defines format, specific representation. NOTE also look to see if GSIM has a logical record - if it does, need to mention it. Need to investigate if there are inconsistent uses of the term in DDI standards, and point these out.
Long data - like an event history; rows are variables, column has unit. Allows adding new variables without adding columns. Similar to rdf triples, 5th or 6th normal form dbms. Need to be able to reference registry of variables.
Machine-actionable - see https://ddialliance.org/taxonomy/term/198 “information that is structured in a consistent way so that machines, or computers, can be programmed against the structure.” [finish here 2022-05-18]
Maintainable (still used in DDI Lifecycle - like a database table of items that are maintained as a whole)
Major version
Minor version
Metadata - social science, behavioral definition
NADA cataloging tool - NADA is an open source microdata cataloging system, compliant with the Data Documentation Initiative (DDI) and Dublin Core’s RDF metadata standards. https://nada.ihsn.org/
N-Cubes - multi-dimensional data cubes used in DDI Codebook
Nesstar - even though it is a software, it is used for DDI
Physical record - physical recording of the values of the logical record
questionnaire
Register - a list
administrative data that holds info that can do research on the subjects, eg, tax, births and deaths
Registry - catalogue that can find data, eg, SDMX
ISO/IEC 11179
Repository - place where data and metadata holdings are maintained and distributed, eg., an archive
representative variable
Resource package - in DDI Lifecycle, a special construction, not for a specific dataset
check to make sure it is still used
SDTL - [this should be in the acronyms list also] an independent intermediate language for representing data transformation commands (from III. Purpose in https://ddi-alliance.atlassian.net/wiki/download/attachments/860815393/Part_1_DDI-CDI_Intro_PR_1.pdf )
this is a DDI product, https://ddialliance.org/products/sdtl/1.0
series – use definition in DDI Lifecycle
statistical classification – add note on clarification of ‘statistical’ vs. other kinds of classification. Classification should be exclusive and exhaustive. See usage in XKOS, CDI, Lifecycle, and GSIM.
study
survey
unit of measurement
unit type
universe
URI - in relation to DDI
URL - in relation to DDI
URN - in relation to DDI
variable cascade
variable types (in context with Lifecycle, Codebook, …)
Versionable
Versioning - a technical specification in DDI
talk about the different specs
wide data
XKOS - this should be in the acronyms list also
this is a DDI product
XML Schema
Possible Terms to addFirst draft 2023-07-26
ISO 704 standard: Terminology work — Principles and methods https://edisciplinas.usp.br/pluginfile.php/312607/mod_resource/content/1/ISO 704.pdf
DDI documents in the scope this glossary. Definitions are intended to apply to usage or understanding of these documents:
Lifecycle
CDI
Conventions:
Terms are written using lower case unless they are proper nouns in which case they are capitalized.
Context for usage of a term is indicated with angle brackets ('<','>').
Terms defined in this glossary are indicated in bold when used in definitions.
Definitions are written with the principle of substitution in mind. The definition can replace the defined term in any place where it is used.
Glossary:
archive (n.)
collection of records, objects, metadata, or data intended for retention for as long as necessary
NOTE: Archives are typically managed based on established standards. Retention implies maintenance of the integrity, security, authenticity, and accessibility of the collection items according to policies established by the archive.
administrative metadata (n.)
content that is related to the interaction or use of the metadata within a specific system
category
concept used to group objects
the meaning of a category is based on the unifying characteristics of its group of objects
classification scheme
an organized set of categories defined within some scope
Codebook [standard](n.)
synonym: DDI-C
DDI standard defining a schema for a simple description of a study without the capability to link to descriptions of other studies
NOTE: Originally DTD-based, DDI-C is an XML Schema for validating XML serialization of Codebook standard. Supports discovery, preservation, and the informed use of data. Defines descriptive content for variables, files, source material, and study level information. Includes metadata about a study; e.g. questionnaire, how the study was conducted.
codebook [instance] (n.)
instance of a study description conforming to the Codebook standard
codebook (n.)
description of the methodology, questions, variables, codelists, and other aspects of a study and the data produced
codelist
list of code-category pairs, where a unique code represents each category
conceptual variable
description of the semantics of a variable independent of any particular representation or implementation
NOTE: A reusable, partial description of a variable limited to semantics. Semantics includes the meaning of values in the value domain, the characteristic represented, and all the associations between units and assigned values. Most general level in variable cascade. Used in DDI Lifecycle, DDI-CDI, GSIM. Top level in variable cascade, with represented variable and instance variable.
controlled vocabulary
list of standardized terminology, words, or phrases used for indexing, content analysis, or information retrieval (CASRAI, CODATA)
NOTE: Usually in a defined information domain. For DDI, controlled vocabularies are standardized under the DDI Alliance. Other controlled vocabularies might not be formal standards, but are agreed upon in some context.
Controlled Vocabulary [DDI]
controlled vocabulary that can be used with DDI as well as for other purposes and applications
NOTE: Used in DDI work products to define metadata elements. These are themselves work products of the DDI Alliance.
correspondence
relationship that asserts a degree of similarity between two concepts
NOTE: The term crosswalk is sometimes used to refer to an individual correspondence.
correspondence table
Synonym: crosswalk
set of correspondences between two distinct sets of concepts
NOTE: used to map similar things in collections. Concepts might be represented by terms in a vocabulary or classification system, fields in a database, entities, characteristics, or properties in a data model.
Cross-Domain Integration
synonym: DDI-CDI
draft standard (as of 2023-07) in the DDI suite that addresses documentation for integrating data from heterogeneous sources
crosswalk
see correspondence table
NOTE: The term crosswalk is sometimes used to refer to an individual correspondence.
Data Documentation Initiative [product]
synonym: DDI
suite of open, human-readable, and machine-actionable specifications used internationally for describing the data produced with surveys and other observational methods in the social, behavioral, economic, and health science domains
Data Documentation Initiative [activity]
synonym: DDI
organized effort to produce metadata standards for the description of social, behavioral, and economic data to foster data reuse and interoperability.
NOTE: Started in 1995, following on work started in the 1980’s. Currently under the stewardship of the DDI Alliance.
data lifecycle
stages of the data production and management process to support research and policy covering conceptualization, design, acquisition, processing, analysis, sharing/dissemination, and archiving
NOTE: There are various models that emphasize different aspects of the lifecycle. For example Data Curation Centre (DCC). The DDI data lifecycle emphasizes aspects related to data production in the social, behavioral, economic, and health science domains.
datum
representation of a concept intended for information processing purposes
NOTE: typically an alphanumeric string; independent of subject area; used in DDI-CDI, defined as “a designation of a value.“ Commonly used as the most granular representation of information in a data processing or management system. A datum is described in DDI standards as a representation of dates, categories, numbers (quantities and percentages), and text.
DDI agency
registered entity responsible for assigning identifiers to DDI metadata items to ensure their uniqueness, and for the maintenance and versioning of those items
NOTE: May be a person, project, or organization which is intending to use DDI and registers to be assigned a recognized agency identifier, which is a prefix for all the DDI identifiers it assigns; the registration service is available at the https://ddialliance.org/products/ddi-agency-id-registry
DDI Instance [DDI Lifecycle]
root element (
<DDIInstance>
) of a related set of DDI metadata as specified in the DDI Lifecycle XML Schema
DDI instance [XML]
XML instance containing DDI metadata as specified in a DDI standard
NOTE: In non-XML syntax representations, the term “instance” may be used in a similar fashion, denoting the instantiation of a class, as in the DDI-CDI model.
DDI Lifecycle
Synonym- DDI-L
statistical metadata standard describing the data lifecycle as defined in the DDI Lifecycle Model
NOTE: one of several standards from DDI Alliance, denoted 3.x .
DDI Lifecycle Model [diagram]
depiction of the stages of the data lifecycle
NOTE: Version of the diagram from DDI lifecycle (Thomas, Gregory and Piazza, 2005). This is informative; there is no formalization of this diagram in any DDI specification.
DDI Lifecycle Model [information model]
specification of metadata used and reused throughout the data lifecycle
NOTE: Through version 3.2 this has been an XML schema. Subsequent versions are documented at https://github.com/ddialliance/ddimodel.
DDI Lite
subset of DDI Codebook elements specifying basic information describing a dataset
NOTE: Includes metadata elements common to various other widely used systems and specifications. See https://ddialliance.org/specification/ddi2.1/lite/index.html
DDI Profile [General]
selection of metadata fields conforming to a DDI specification for use by a particular community or for a specific application
DDI Profile [DDI Lifecycle]
formal XML expression of the DDI Lifecycle elements used by a particular community or application
NOTE: defined in the DDI Lifecycle specification in the element <DDIProfile>
DDI scheme [DDI Lifecycle]
package of related metadata items of a single type (e.g., concepts, variables, categories) for the purposes of data/metadata management and reuse, owned and maintained by a DDI agency
NOTE: defined in the DDI Lifecycle specification. Analogous to a database table or a lookup list.
Examples: <RepresentedVariableScheme>, <QuestionScheme>, <ConceptScheme>, <CategoryScheme>.
dimensional data
data organized as a matrix with each axis defined by a set of categories
NOTE: Synonyms: multidimensional data, data cube, N-Cube. The category sets define a coordinate system for identifying and describing individual datums, sometimes supplemented with time, or other non-categorical variables. Distinct from unit record data or sensor data.
Examples: Possible axes: Educational attainment level, age range, region, occupation, industry, and gender.
DISCO
set of metadata fields generalized from DDI Codebook and DDI Lifecycle for supporting Web searches for data using the W3C linked data technologies
discovery (noun)
location and evaluation of available data or related/supplementary resources
NOTE: Enabled by search engines, catalogs, based on metadata such as that modeled using the DDI specifications that make resources discoverable (see ‘Findable’ part of FAIR).
dissemination
distribution of either data with related/supplementary resources or metadata for the purposes of use and reuse
NOTE: This is the stage in the DDI Lifecycle Model labelled “Data Dissemination,” and a phase in the Generic Statistical Business Process Model (GSBPM).
DTD
Document Type Definition
XML tagset definition for marking up a class of documents
NOTE: Document description language that predates XML schema. First version of Codebook was represented using a DTD. Note that term has been superseded by W3C XML schema for representing DDI work products. See DDI taxonomy page.
external reference [DDI specification]
link to a resource whose content is not represented inline in a DDI instance
NOTE: This is used when making reference to resources that complement the content of the metadata, e.g. a pdf copy of a questionnaire, or other DDI metadata intended for reuse that is published elsewhere.
FAIR [acronym]
Findable, Accessible, Interoperable, Reusable in the context of data publication or sharing.
NOTE: For details, see Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3. 160018. https://doi.org/10.1038/sdata.2016.18
Genericode [OASIS]
representation standard used to publish DDI Controlled Vocabularies using XML
NOTE: http://docs.oasis-open.org/codelist/cs-genericode-1.0/doc/oasis-code-list-representation-genericode.html “OASIS Code List Representation format, “genericode”, is a single model and XML format (with a W3C XML Schema) that can encode a broad range of code list information.” Currently superseded in DDI by SKOS.
identifiable (noun) [DDI Lifecycle]
class of metadata objects that can be assigned an identifier
NOTE: used in DDI Lifecycle, motivated by need to reference or reuse some information object.
IHSN [acronym]
International Household Survey Network; an informal network of organizations using DDI Codebook
NOTE: The network provides guidance on best practices for survey development, documentation, and data management. The group has developed software tools, including the IHSN Microdata Management Toolkit. See http://ihsn.org/.
inline
represented explicitly in the XML instance
As opposed to included by reference to a separate resource. DDI Lifecycle uses schemes to enable reuse of metadata, and these can be inline or external.
instance variable
description of a variable in the context of a particular dataset
Most granular element in variable cascade
instrument
implemented mechanism for collecting data
NOTE: for example a paper form, a sensor, software that mines or collects data; could be virtual or mechanical; an instrument is a particular implementation of some design for collecting data. (https://docs.pidinst.org/en/latest/white-paper/instrument-pids.html ) See also survey instrument
interoperability
capacity of a product or system to work with other products or systems (https://en.wikipedia.org/wiki/Interoperability )
NOTE: as mentioned in the FAIR principles ( https://www.go-fair.org/fair-principles/ ). There are several aspects: data, instruments, semantics, system, syntax. The capacity to interact meaningfully and correctly. System is construed broadly to include any kind of interacting agent. Implies minimal modification to either product or system
key-value data (data structure)
data structure consisting of ordered pairs comprising a key (identifier) and value (datum)
The structure is a set of ordered key-value pairs. The identifier is pointer to locate the value in the data structure. Values can be nested key-value structures. Typically the key is intended to encode or reference some semantics of the value. Within the context of a dataset, the key is considered unitary; key does not necessarily imply any internal structure within the key.
logical record (CDI, DDI Lifecycle)
ordered set of instance variables
Used in description of structure of a dataset. Does not specify format or specific implementation.
long data
data structure in which each row contains a variable, its value, unit ID, and possibly some attribute values
Data structure consisting of separate records that associate an entity (unit) instance with one variable.
Like an event history; rows are variables. Allows adding new variables without adding columns. The values can be qualified with attributes that add information about observation method, time, etc. Can add rows to account for new variables.
machine-actionable data
structured data that are represented in a way so that a machine (computer) can be programmed to read and process each datum.
This is meant to be deterministic as opposed to probabilistic (e.g., natural language processing).
macrodata
data based on aggregates rather than individuals
compare with microdata
maintainable (noun)
versionable that includes additional administrative attributes
https://ddialliance.github.io/ddimodel-web/573711d7075561ba27fd5aa825c3db32745d70fb/item-types/Maintainable/ defines the additional administrative attributes. Can be used as an adjective.
measure (noun)
quantitative variable
NOTE: A numeric as opposed to a categorical variable.
metadata
data in the role of describing some object(s)
microdata
data that represent individuals from some unit type, universe, or population
compare with macrodata
multi-dimensional data
synonym: data cube, n-cube
data for which observations are logically organized by multiple facets (e.g. space, time, social determinants…) called dimensions that function as the axes of a coordinate system.
This is referred to as a data cube or n-cube in some communities. Tables can be understood as a presentation of an n-cube. Time is given special consideration in some communities as essential property of an observation. Some also use time as dimension.
observation
value for a variable for a particular unit
the result of an objective measurement process
physical record
stored representation of the values of a set of instance variables
how a logical record is implemented for storage in an information system. Instance of a physical record structure.
physical record structure
pattern according to which a set of instance variables are stored
mapping from logical record to physical record
population
universe in which the individuals share time and geography
Typically the individuals are located in the same temporal and spatial extent. A pan-European survey administered in France and Greece might have a distinct population for each country.
[TBD-- review usage in lifecycle and codebook to compare]
question
a formal interrogative statement used to collect an observation
questionnaire
organized set of questions designed to collect information on specific topics from a respondent
register
authoritative list of items maintained for the purpose of documenting and promoting consistent usage
used as a source of administrative data for research on the relevant subjects, e.g., tax, births and deaths; typically secure and authoritative. A register is maintained by a registry.
registry
rules, activities, and mechanisms for maintaining and accessing a register
characterized by the ability to enter, classify, describe, and manage items in a register according to a set of rules
see ISO/IEC 11179+-6 (Registration) for a formal description (https://standards.iso.org/ittf/PubliclyAvailableStandards/c078916_ISO_IEC%2011179-6_2023(en).zip); common examples include the SDMX Global Registry and the DDI Agency ID Registry
repository
place where items can be deposited for preservation and retrieval
A database of metadata and/or data intended to support search, discovery, and reuse
A database or file system are particular implementations of a repository.
represented variable
specification for the encoding of substantive values of a variable, based on a conceptual variable
Represented variable is middle level of the variable cascade. Includes substantive value domains. Reusable across all populations within the universe it describes.
resource package (DDI Lifecycle)
reusable metadata outside the structure of a specific study or series
SDTL [acronym]
Structured Data Transformation Language; language for representing data transformation commands
[Note] a DDI product originally developed as part of the C2Metadata project; designed to describe processing similar to that in packages like R, Stata, SPSS, and SAS. Used for documentation.
sentinel value
value indicating missing, refused, or other invalid data result
Codes for representing sentinel values (missing, refused) are provided by each major statistical package, but differ between packages.
series [DDI]
collection of studies related for some purpose
for example using the same or similar questions and collecting the same or similar variables; includes longitudinal and repeat cross-sectional studies; similar to a statistical program in an official statistics context.
statistical classification
hierarchical classification scheme in which categories must be mutually exclusive and exhaustive at each level.
Subjects might include e.g. industries, occupations, diseases, education levels, etc.
Might be part of a group of classifications, maintained as a series of official versions. See usage in XKOS, CDI, Lifecycle, and GSIM. For example the North American Industrial Classification System (NAICS), International Classification of Diseases (ICD)
study (noun, DDI)
organized activity and artifacts related to the design, collection, processing, and dissemination of data for the observation and analysis of a particular phenomenon
Used as an organizing principle for packages of data and metadata. May cover several related data files/datasets, surveys, methods, etc.
substantive value
subject matter-related value assigned by a variable for a unit that isuseful for analysis or estimation
not processing-related (e.g. sentinel value)
survey (noun)
data collection activity based on a sample of some population used to estimate characteristics of that population
distinct from census or other studies based on an entire population
survey instrument
instrument based on a questionnaire
NOTE: specialization of instrument. DDI Lifecycle includes a detailed model for survey instrument that is machine actionable.
time series
set of measurements based on the same measure made at different times
NOTE: Ideally all the measurements are comparable. Used as an important formalization with aggregates, where measures are multi-dimensional. A time series is the same in all dimensions except for time. The term is also used for observations in longitudinal and repeat cross-sectional contexts, where the observations are assumed to be comparable. Commonly the interval between measurements is regular.
unit
individual member of a group that is being measured
a distinct member of a universe. Technically, an individual whose properties correspond to the characteristics of a unit type. Properties distinguish units; characteristics distinguish unit types.
unit of measurement
consistent and agreed-upon scale for expressing, quantifying, and comparing values of a measure
NOTE: A quantity is used to define a measurement reference system. A quantity kind e.g. length, temperature, currency, has specific quantities for that kind, e.g. meter, degree Kelvin, Euro, respectively.
unit type
class of units defined by essential characteristics
Examples: person, household, business establishment. A kind of entity based on a set of characteristics. The characteristics are specific to the study context. Ideally a category of individuals in a non-overlapping classification.
universe
class of individuals that share a unit type and typically have other characteristics in common, exclusive of time and geography
Given a unit type, a universe is the domain of those ‘units’ that is observed. For example a unit type might be ‘humans’ and a universe might be ‘humans who are nurses’. A population is a universe at a given place and time. Some definitions of universe are more or less precise about the characteristics differentiating universe and population, e.g. DDI Lifecycle and DDI Codebook. The definition here is consistent with DDI CDI.
URI (Uniform Resource Identifier)
compact sequence of characters that identifies an abstract or physical resource
defined in IETF RFC 3986. URL and URN are kinds of URI, see IETF RFC 3305. Each URI begins with a scheme name that refers to a specification for identifiers within that scheme (see IETF RFC 7595 ).
URN (Uniform Resource Name)
URI intended to serve as persistent, location-independent, resource identifier
DDI is in the process of registering a URN namespace and syntax for identification of resources that conform to the standards published by the Data Documentation Initiative (DDI) Alliance. The registration process and identifier syntax rules are described in IETF RFC 2141. Originally, the idea was that a URI would be either a URL or a URN. See discussion in IETF RFC 3305.
value (DDI)
concept represented by a datum.
NOTE: A datum is a representation of a value. A value is a concept underlying a datum.
value (computer science)
representation of some entity that can be manipulated by a computer program
Definition from Mitchell, 1996 (ISBN 0-262-13321-0.). Discussion of ideas like ‘literal value', 'passing value by reference’ occur in DDI documentation, but use the computer science meaning of the term, rather than the DDI sense.
For reference, highlight differences between GSIM and DDI: ‘A Datum is the actual instance of data that was collected or derived. It is the value which populates a Data Point. A Datum is the value found in a cell of a table.’ (GSIM 1.2 definition, specified as synonym of value)
value domain
set of allowed values
DDI-CDI defines value domain as a ‘Set of permissible values for a variable (adapted from ISO/IEC 11179)'.
variable
mapping to a value domain from a set of units corresponding to a single unit type.
A mapping is an association. The usage in DDI does not correspond exactly with other common uses (e.g. https://en.wikipedia.org/wiki/Variable_and_attribute_(research)), but is specific to data description. If the unit is a person, their gender would be characteristics. The values of these characteristics are also concepts, e.g. 'male'. The variable here is ‘gender’, the value ‘male’ is a category. Values that a variable assigns are properties of the units being measured.
variable (DDI Lifecycle)
synonym of instance variable as used in DDI-CDI.
variable cascade [Lifecycle, CDI]
hierarchical description of a variable comprising conceptual, represented, and instance levels
key concept in DDI Lifecycle, GSIM, and CDI. Cascade is designed to maximize reuse of metadata and facilitate data discovery and integration.
versionable (noun)
identifiable that can have individually distinguished versions
Allows evolving definition of usage, content, implementation of some information entity. Each version is an update based on the preceding version, defining a partial order.
wide data
data structure in which rows represent units and columns represent variables
Each row is a record representing a unit. Also referred to as a rectangular structure.
XKOS - eXtended Knowledge Organization System.
extension of Simple Knowledge Organization System (SKOS) vocabulary that includes elements required to describe statistical classifications.
Adds semantics for relationships, levels in hierarchy, association of a concept with a hierarchy level. Adds more granular relations for broader/narrower. Also defines ordering relations (following/predecessor? transitive; next/previous not transitive…). Rendered in RDF. See https://rdf-vocabulary.ddialliance.org/xkos.html this is a DDI product. SKOS is a W3C product.
2024-01-24 Finished workgroup review of terms and definitions!! Next-- send to scientific board.
Possible Terms to add
characteristic (see CDI appendix)
concept (see CDI appendix)
DDI Alliance
discoverability
methodology
reuse ? [add because of frequent usage in DDI specs]
vocabulary
other terms with DDI-specific usage.
terms that are used in single, generic, widely understood sense are not included
Acronyms
CDI
DDI
DISCO
DTD
ESS -- European Social Survey -should be in list of acronyms. not really a concept (2022-01-26)
do we really need this - how is it related to DDI?
IHSN
SDTL
SKOS
XKOS
Terms removed
versioning
XML schema
variable type