Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

First draft 2023-07-26

ISO 704 standard: Terminology work — Principles and methods https://edisciplinas.usp.br/pluginfile.php/312607/mod_resource/content/1/ISO 704.pdf

...

  • multi-dimensional data

    • synonym: data cube, n-cube

    • data for which observations are logically organized by multiple facets (e.g. space, time, social determinants…) called dimensions that function as the axes of a coordinate system.

      • This is referred to as a data cube or n-cube in some communities. Tables can be understood as a presentation of an n-cube. Time is given special consideration in some communities as essential property of an observation. Some also use time as dimension.

  • observation

    • value for a variable for a particular unit

      • the result of an objective measurement process

  • physical record

    • stored representation of the values of a set of instance variables

      • how a logical record is implemented for storage in an information system. Instance of a physical record structure.

  • physical record structure

    • pattern according to which a set of instance variables are stored

      • mapping from logical record to physical record

  • population

    • universe in which the individuals share time and geography

      • Typically the individuals are located in the same temporal and spatial extent. A pan-European survey administered in France and Greece might have a distinct population for each country.

      • [TBD-- review usage in lifecycle and codebook to compare]

  • question

    • a formal interrogative statement used to collect an observation

  • questionnaire

    • organized set of questions designed to collect information on specific topics from a respondent

  • register

    • authoritative list of items maintained for the purpose of documenting and promoting consistent usage

      • used as a source of administrative data for research on the relevant subjects, e.g., tax, births and deaths; typically secure and authoritative. A register is maintained by a registry.

  • registry

  • repository

    • place where items can be deposited for preservation and retrieval

      • A database of metadata and/or data intended to support search, discovery, and reuse

      • A database or file system are particular implementations of a repository.

  • represented variable

    • specification for the encoding of substantive values of a variable, based on a conceptual variable

      • Represented variable is middle level of the variable cascade. Includes substantive value domains. Reusable across all populations within the universe it describes.

  • resource package (DDI Lifecycle)

    • reusable metadata outside the structure of a specific study or series

  • SDTL [acronym]

    • Structured Data Transformation Language; language for representing data transformation commands

      • [Note] a DDI product originally developed as part of the C2Metadata project; designed to describe processing similar to that in packages like R, Stata, SPSS, and SAS. Used for documentation.

  • sentinel value

    • value indicating missing, refused, or other invalid data result

      • Codes for representing sentinel values (missing, refused) are provided by each major statistical package, but differ between packages.

  • series [DDI]

    • collection of studies related for some purpose

      • for example using the same or similar questions and collecting the same or similar variables; includes longitudinal and repeat cross-sectional studies; similar to a statistical program in an official statistics context.

  • statistical classification

    • taxonomy of categories organized by characteristics of a fixed subject used to group units for statistical purposes

      • Subject might include e.g. industries, occupations, diseases, education levels, etc.

      • Taxonomies are commonly hierarchical; classification categories must be mutually exclusive and exhaustive at a given level.

      • Might be part of a group of classifications, maintained as a series of official versions. See usage in XKOS, CDI, Lifecycle, and GSIM. For example the North American Industrial Classification System (NAICS), International Classification of Diseases (ICD)

  • study (noun, DDI)

    • organized activity and artifacts related to the design, collection, processing, and dissemination of data for the observation and analysis of a particular phenomenon

      • Used as an organizing principle for packages of data and metadata. May cover several related data files/datasets, surveys, methods, etc.

  • substantive value

    • subject matter-related value assigned by a variable for a unit that isuseful for analysis or estimation

      • not processing-related (e.g. sentinel value)

  • survey (noun)

    • data collection activity based on a sample of some population used to estimate characteristics of that population

      • distinct from census or other studies based on an entire population

  • survey instrument

    • instrument based on a questionnaire

      • NOTE: specialization of instrument. DDI Lifecycle includes a detailed model for survey instrument that is machine actionable.

  • time series

    • set of measurements based on the same measure made at different times

      • NOTE: Ideally all the measurements are comparable. Used as an important formalization with aggregates, where measures are multi-dimensional. A time series is the same in all dimensions except for time. The term is also used for observations in longitudinal and repeat cross-sectional contexts, where the observations are assumed to be comparable. Commonly the interval between measurements is regular.

  • unit

    • individual member of a group that is being measured

      • a distinct member of a universe. Technically, an individual whose properties correspond to the characteristics of a unit type

      • a distinct member of a universe or population; a member of a group that is being measured. Properties distinguish units; characteristics distinguish unit types.

  • unit of measurement

    • consistent and agreed-upon reference quantity for comparing and describing scale for expressing, quantifying, and comparing values of a measure

      • NOTE: The reference A quantity is used to define a measurement reference system. A quantity kind e.g. length, temperature, moneycurrency, has specific reference quantities for that kind, e.g. meter, degree Kelvin, Euro, respectively.

  • unit type

    • class of units defined by essential characteristics

      • Examples: person, household, business establishment. A kind of entity based on a set of characteristics. The characteristics are specific to the study context. Ideally a category of individuals in a non-overlapping classification.

  • universe

    • class of individuals that share a unit type and typically have other characteristics in common, exclusive of time and geography

      • Given a unit type, a universe is the domain of those ‘units’ unitsthat is observed. For example a unit type might be ‘humans’ and a universe might be ‘humans who are nurses’. A population is a universe at a given place and time. Some definitions of universe are more or less precise about the characteristics differentiating universe and population, e.g. DDI Lifecycle and DDI Codebook. The definition here is consistent with DDI CDI.

  • URI (Uniform Resource Identifier)

    • compact sequence of characters that identifies an abstract or physical resource

      • defined in IETF RFC 3986. URL and URN are kinds of URI, see IETF RFC 3305. Each URI begins with a scheme name that refers to a specification for identifiers within that scheme (see IETF RFC 7595 ).

  • URN (Uniform Resource Name)

    • URI intended to serve as persistent, location-independent, resource identifier

      • DDI is in the process of registering a URN namespace and syntax for identification of resources that conform to the standards published by the Data Documentation Initiative (DDI) Alliance. The registration process and identifier syntax rules are described in IETF RFC 2141. Originally, the idea was that a URI would be either a URL or a URN. See discussion in IETF RFC 3305.

  • value (DDI)

    • concept represented by a datum.

      • NOTE: A datum is a representation of a value. A value is a concept underlying a datum.

  • value (computer science)

    • representation of some entity that can be manipulated by a computer program

      • Definition from Mitchell, 1996 (ISBN 0-262-13321-0.). Discussion of ideas like ‘literal value', 'passing value by reference’ occur in DDI documentation, but use the computer science meaning of the term, rather than the DDI sense.

      • [for For reference, highlight differences between GSIM and DDI: ‘A Datum is the actual instance of data that was collected or derived. It is the value which populates a Data Point. A Datum is the value found in a cell of a table.’ (GSIM 1.2 definition, specified as synonym of value)]

  • value domain

    • set of allowed values within a specified scope

      • DDI-CDI defines value domain as a ‘Set of permissible values for a variable (adapted from ISO/IEC 11179)'.

  • variable

    • mapping to a value domain from a concept used to define a characteristic of a set of units corresponding to a single unit type.

      • A mapping is an association. The usage in DDI does not correspond exactly with other common uses (e.g. https://en.wikipedia.org/wiki/Variable_and_attribute_(research)), but is specific to data description. If the unit is a person, their gender would be characteristics. The values of these characteristics are also concepts, e.g. 'male'. The variable here is ‘gender’, the value ‘male’ is a category. Variables hold the measurements of characteristicsValues that a variable assigns are properties of the units being measured.

  • variable (DDI Lifecycle)

    • synonym of instance variable as used in DDI-CDI.

  • variable cascade [Lifecycle, CDI]

    • hierarchical description of a variable comprising levels from conceptual to representation to implementation

      • key concept in DDI Lifecycle, GSIM, and CDI. Cascade is designed to maximize reuse of metadata and facilitate data discovery and integration.

  • variable type [Lifecycle, Codebook]

    • construct to characterize the application of a variable

      • in the context of Lifecycle, Codebook, a low-level description of the implementation of a variable, defined in XML schema.

  • versionable (noun)

    • identifiable that can have individually distinguished versions

      • Allows evolving definition of usage, content, implementation of some information entity. Each version is an update based on the preceding version, defining a partial order.

  • wide data

    • data structure in which rows represent units and columns represent variables

      • Each row is a record representing a unit.

  • XKOS - eXtended Knowledge Organization System.

    • extension of Simple Knowledge Organization System (SKOS) vocabulary that includes elements required by statistical offices.

      • Adds semantics for relationships, levels in hierarchy, association of a concept with a hierarchy level. Adds more granular relations for broader/narrower. Also defines ordering relations (following/predecessor? transitive; next/previous not transitive…). Rendered in RDF. See https://rdf-vocabulary.ddialliance.org/xkos.html this is a DDI product. SKOS is a W3C product.

  • XML Schema

...