Glossary work

Glossary work

Final draft 2024-07-03

ISO 704 standard: Terminology work — Principles and methods https://edisciplinas.usp.br/pluginfile.php/312607/mod_resource/content/1/ISO 704.pdf

DDI documents in the scope this glossary. Definitions are intended to apply to usage or understanding of these documents:

Conventions:

  • Terms are written using lower case unless they are proper nouns in which case they are capitalized.

  • Context for usage of a term is indicated with angle brackets ('<','>').

  • Terms defined in this glossary are indicated in bold when used in definitions.

  • Definitions are written with the principle of substitution in mind. The definition can replace the defined term in any place where it is used.

Glossary:

  • archive (n.)

    • collection of records, objects, metadata, or data intended for retention for as long as necessary

      • NOTE: Archives are typically managed based on established standards. Retention implies maintenance of the integrity, security, authenticity, and accessibility of the collection items according to policies established by the archive. DDI-Lifecycle supports various archive activities.

  • archive activity

    • activity that supports the future use of metadata and data through preservation, documentation, and access options.

      • NOTE: Archive activities can be done by anyone who manages the metadata and data during its lifetime and does not need to be done by a formal archive.

  • administrative metadata (n.)

    • content that is related to the interaction or use of the metadata within a specific system

  • category

    • concept used to group objects

      • the meaning of a category is based on the unifying characteristics of its group of objects

  • classification scheme

    • an organized set of categories defined within some scope

  • Codebook [standard] (n.)

    • see: DDI-Codebook

  • codebook [instance] (n.)

    • instance of a study description conforming to the DDI-Codebook standard

  • codebook (n.)

    • description of the methodology, questions, variables, codelists, and other aspects of a study and the data produced

  • codelist

    • list of code-category pairs, where a unique code represents each category

  • conceptual variable

    • description of the semantics of a variable independent of any particular representation or implementation

      • NOTE: A reusable, partial description of a variable limited to semantics. Semantics includes the meaning of values in the value domain, the characteristic represented, and all the associations between units and assigned values. Most general level in variable cascade. Used in DDI-Lifecycle, DDI-CDI, GSIM (Generic Statistical Information Model). Top level in variable cascade, with represented variable and instance variable.

  • controlled vocabulary

    • list of standardized terminology, words, or phrases used for indexing, content analysis, or information retrieval (CASRAI, CODATA)

      • NOTE: Usually in a defined information domain. For DDI, controlled vocabularies are standardized under the DDI Alliance. Other controlled vocabularies might not be formal standards, but are agreed upon in some context.

  • controlled vocabulary [DDI]

    • see DDI Controlled Vocabulary

  • correspondence

    • relationship that asserts a degree of similarity between two concepts

      • NOTE: The term crosswalk is sometimes used to refer to an individual correspondence.

  • correspondence table

    • synonym: crosswalk

    • set of correspondences between two distinct sets of concepts

      • NOTE: used to map similar things in collections. Concepts might be represented by terms in a vocabulary or classification system, fields in a database, entities, characteristics, or properties in a data model.

  • crosswalk

    • see correspondence table

      • NOTE: The term crosswalk is sometimes used to refer to an individual correspondence.

  • Data Documentation Initiative [product]

    • synonym: DDI

    • suite of open, human-readable, and machine-actionable specifications used internationally for describing the data produced with surveys and other observational methods in the social, behavioral, economic, and health science domains

  • Data Documentation Initiative [activity]

    • synonym: DDI

    • organized effort to produce metadata standards for the description of social, behavioral, and economic data to foster data reuse and interoperability.

      • NOTE: Started in 1995, following on work started in the 1980’s. Currently under the stewardship of the DDI Alliance.

  • data lifecycle

    • stages of the data production and management process to support research and policy covering conceptualization, design, acquisition, processing, analysis, sharing/dissemination, and archiving

      • NOTE: There are various models that emphasize different aspects of the lifecycle. For example Data Curation Centre (DCC). The DDI data lifecycle emphasizes aspects related to data production in the social, behavioral, economic, and health science domains.

  • datum

    • representation of a concept intended for information processing purposes

      • NOTE: typically an alphanumeric string; independent of subject area; used in DDI-CDI, defined as “a designation of a value.“ Commonly used as the most granular representation of information in a data processing or management system. A datum is described in DDI standards as a representation of dates, categories, numbers (quantities and percentages), and text.

  • DDI

    • see Data Documentation Initiative

  • DDI agency

    • registered entity responsible for assigning identifiers to DDI metadata items to ensure their uniqueness, and for the maintenance and versioning of those items

      • NOTE: May be a person, project, or organization which is intending to use DDI and registers to be assigned a recognized agency identifier, which is a prefix for all the DDI identifiers it assigns; the registration service is available at the https://ddialliance.org/products/ddi-agency-id-registry

  • DDI-CDI [acronym; standard] (n.)

  • DDI-Codebook [standard] (n.)

    • synonym: DDI-C

    • DDI standard defining a schema for a simple description of a study without the capability to link to descriptions of other studies

      • NOTE: Originally DTD-based, DDI-C is an XML Schema for validating XML serialization of DDI-Codebook standard. Supports discovery, preservation, and the informed use of data.  Defines descriptive content for variables, files, source material, and study level information. Includes metadata about a study; e.g. questionnaire, how the study was conducted.

      • NOTE: See https://ddialliance.org/Specification/DDI-Codebook/2.5/

  • DDI Controlled Vocabulary [technical product] (n.)

    • controlled vocabulary that can be used with DDI as well as for other purposes and applications

  • DDI DISCO [acronym; technical product] (n.)

  • DDI Instance [DDI-Lifecycle]

    • root element (<DDIInstance>) of a related set of DDI metadata as specified in the DDI-Lifecycle XML Schema

  • DDI instance [XML]

    • XML instance containing DDI metadata as specified in a DDI standard

      • NOTE: In non-XML syntax representations, the term “instance” may be used in a similar fashion, denoting the instantiation of a class, as in the DDI-CDI model.

  • DDI-Lifecycle

    • synonym: DDI-L

    • statistical metadata standard describing the data lifecycle as defined in the DDI-Lifecycle Model

      • NOTE: one of several standards from DDI Alliance.

  • DDI-Lifecycle Model [diagram]

    • depiction of the stages of the data lifecycle

      • NOTE: Version of the diagram from Inside View of DDI Version 3.0 (Thomas, Gregory and Piazza, 2005). This is informative; there is no formalization of this diagram in any DDI specification.

  • DDI-Lifecycle Model [information model]

    • specification of metadata used and reused throughout the data lifecycle

  • DDI Lite

    • subset of DDI-Codebook elements specifying basic information describing a dataset

  • DDI Profile [General]

  • DDI Profile [DDI-Lifecycle]

    • DDI Profile [General] of DDI-Lifecycle specification expressed as xml schemas

      • NOTE: defined in the DDI-Lifecycle specification in the element <DDIProfile>. As DDI-Lifecycle evolves, specifying profiles using other serialization schemes (e.g. RDF, JSON) is planned.

  • DDI scheme [DDI-Lifecycle]

    • package of related metadata items of a single type (e.g., concepts, variables, categories) for the purposes of data/metadata management and reuse, owned and maintained by a DDI agency

      • NOTE: defined in the DDI-Lifecycle specification. Analogous to a database table or a lookup list.

      • Examples: <RepresentedVariableScheme>, <QuestionScheme>, <ConceptScheme>, <CategoryScheme>.

  • DDI SDTL [technical product] (n.)

    • synonym: SDTL, Structured Data Transformation Language

    • language for representing data transformation commands

  • dimensional data

    • data organized as a matrix with each axis defined by a set of categories

      • NOTE: Synonyms: multidimensional data, data cube, N-Cube. The category sets define a coordinate system for identifying and describing individual datums, sometimes supplemented with time, or other non-categorical variables. Distinct from unit record data or sensor data.

      • Examples: Possible axes: Educational attainment level, age range, region, occupation, industry, and gender.

  • discovery (n.)

    • location and evaluation of available data or related/supplementary resources

      • NOTE: Enabled by search engines, catalogs, based on metadata such as that modeled using the DDI specifications that make resources discoverable (see ‘Findable’ part of FAIR).

  • dissemination

    • distribution of either data with related/supplementary resources or metadata for the purposes of use and reuse

  • DTD [acronym]

    • Document Type Definition

    • XML tagset definition for marking up a class of documents

      • NOTE: Document description language that predates XML schema. First version of DDI-Codebook was represented using a DTD. Note that term has been superseded by W3C XML schema for representing DDI work products. See DDI taxonomy page.

  • external reference [DDI specification]

    • link to a resource whose content is not represented inline in a DDI instance

      • NOTE: This is used when making reference to resources that complement the content of the metadata, e.g. a pdf copy of a questionnaire, or other DDI metadata intended for reuse that is published elsewhere.

  • FAIR [acronym] (adj.)

    • FAIR (Findable, Accessible, Interoperable, Reusable) Principles in the context of data publication or sharing.

      • NOTE: For details, see Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3. 160018. https://doi.org/10.1038/sdata.2016.18

  • Genericode [OASIS]

  • identifiable [DDI-Lifecycle] (n.)

    • class of metadata objects that can be assigned an identifier

      • NOTE: used in DDI-Lifecycle, motivated by need to reference or reuse some information object.

  • IHSN [acronym]

    • International Household Survey Network; an informal network of organizations using DDI-Codebook

  • inline

    • represented explicitly in the XML instance

      • As opposed to included by reference to a separate resource. DDI-Lifecycle uses schemes to enable reuse of metadata, and these can be inline or external.

  • instance variable

    • description of a variable in the context of a particular dataset

      • Most granular element in variable cascade

  • instrument

  • interoperability

  • key-value data [data structure]

    • data structure consisting of ordered pairs comprising a key (identifier) and value (datum)

      • The structure is a set of ordered key-value pairs. The identifier is pointer to locate the value in the data structure. Values can be nested key-value structures. Typically the key is intended to encode or reference some semantics of the value. Within the context of a dataset, the key is considered unitary; key does not necessarily imply any internal structure within the key.

  • logical record [DDI-CDI, DDI-Lifecycle]

    • set of instance variables

      • Used in description of structure of a dataset. Does not specify format or specific implementation. The set is ordered in DDI-CDI, but not assumed or required to be an ordered set in DDI-Lifecycle.

  • long data

    • data structure in which each row contains a variable, its value, unit ID, and possibly some attribute values

      • Data structure consisting of separate records that associate an entity (unit) instance with one variable.

      • Like an event history; rows are variables. Allows adding new variables without adding columns. The values can be qualified with attributes that add information about observation method, time, etc. Can add rows to account for new variables.

  • machine-actionable data

    • structured data that are represented in a way so that a machine (computer) can be programmed to read and process each datum.

      • This is meant to be deterministic as opposed to probabilistic (e.g., natural language processing).

  • macrodata

    • data based on aggregates rather than individuals

      • compare with microdata

  • maintainable (n.)

  • measure (n.)

    • quantitative variable

      • NOTE: A numeric as opposed to a categorical variable.

  • metadata

    • data in the role of describing some object(s)

  • microdata

    • data that represent individuals from some unit type, universe, or population

      • compare with macrodata

  • multi-dimensional data

    • synonym: data cube, n-cube

    • data for which observations are logically organized by multiple facets (e.g. space, time, social determinants…) called dimensions that function as the axes of a coordinate system.

      • This is referred to as a data cube or n-cube in some communities. Tables can be understood as a presentation of an n-cube. Time is given special consideration in some communities as essential property of an observation. Some also use time as dimension.