Metadata for variables and attributes
discovery, assessability, integratability
Discovery could be multi-layered like:
- discovery across domain-specific repositories, human interaction required
- discovery in a portal which provides pre-selected data and possibly harmonized data, more automation might be possible in the search and discovery
Scenarios:
Temporal integration example: sensor data every 3 hours: clinical data aggregated by year
Spatial integration: point located data (sensors) vs. municipality/neighborhood (admin region). Admin region definition might be time dependent.
Measurement alignment: units, conceptual, procedural
sampling feature identification-- e.g. identify households/individuals to correlate environmental and clinical data; aggregated data, intentionally fuzzy data (fossil locations)
hand function reported at various times of the day: capture context of response
how to deal with Sentinel (a la DDI) values
questions:
find surveys that ask about party affiliation
find surveys that have asked questions like X (question reuse)
find surveys that have response ranges like Y
how many people came into the hospital with respiratory infections
find surveys with similar response populations
Solutions:
index questions
controlled vocabulary to classify question topics
include response value domain in metadata
viewpoints (DDI).
unit data records, aggregate records, object oriented/sparse data, network data structures
data model for data objects. Data object can by type for a variable.
Data objects are inputs to processes
integrating on sampling feature (same person, same house, same rock sample)
Integrate on property type
instance variable relation ISO11404-- adjacency list; implement factory of data types
High level metadata:
Concepts for ddi:ConceptualVariable and ddi:UnitType (ssn:Property, ssn:FeatureOfInterest)
Capture information (URI for om:Procedure); ramify to instruments, protocols, sensors at property/variable/attribute level.
what else about variables?
Comparison with the DXWG dataset extension (to DCAT) proposal
https://github.com/w3c/dxwg/wiki/Data-aspects-semantics
empo:Dataset - metadata record
ssn-ext:ObservationCollection - ddi:DataCube
sosa:ObservableProperty - Concept or ConceptualVariable
sosa:FeatureOfInterest - ddi:UnitType or ddi:Population, will vary depending on sampling strategy.
sosa:UltimateFeatureOfInterest - ddi:Universe
sosa:SampledFeatureType - ddi:UnitType
Leave DataCapture/Procedure for now, other than to note:
Platform/Procedure/Sensor - ddi:DataCapture
and that having some indication of the observation procedure/DataCapture approach in the high level metadata will be important for assessment of fitness.
What would you like to incorporate into Variable information?
(Note - need to look at Larry's work on harmonising with CSV on the Web Variables)
From DDI InstanceVariable:
- unitOfMeasurement (but preferably more tightly typed)
- Universe and/Or Population (UltimateFeatureOfInterest/FeatureOfInterest)
- UnitType (individual sample, SampledFeature)
- do the Units have identifiers? (to enable joining on unit with other data)
- SubstantiveValueDomain (or maybe ConceptualDomain)
- SentinelValueDomain?????
- Capture (although may need to be abstracted)
Need to distinguish these for use - either as Discovery or Assessability
Spatial information:
- "StructureGeographyView" includes relevant study level metadata
- Could we do this at a variable level?
- What is the spatial resolution of the unit/sampled feature location (need for space-based integration)
Temporal information
- "TimeMethod" as a study level (with a CV - linked to a Profile??)
- Need information at the variable level
- What is the temporal resolution of the observation time (need for time-based data integration)
Possible alignment of requirements with DDI Variable Cascade:
Requirement | Level |
---|---|
Discoverability | Conceptual |
Assessability | Represented |
Interoperability | Instance |
See also:
- Schema.org: variableMeasured
- FGDC CSDGM: Entity and Attribute information https://www.fgdc.gov/csdgmgraphical/entatt.htm
- ISO 19110
- ISO 11179-3
- ISO 19115 ContentInformation, ISO19115-1 AcquisitionInformation