Metadata for variables and attributes

discovery, assessability, integratability

Discovery could be multi-layered like:

  • discovery across domain-specific repositories, human interaction required
  • discovery in a portal which provides pre-selected data and possibly harmonized data, more automation might be possible in the search and discovery

Scenarios:

Temporal integration example: sensor data every 3 hours:  clinical data aggregated by year

Spatial integration: point located data (sensors) vs. municipality/neighborhood (admin region).  Admin region definition might be time dependent.

Measurement alignment: units, conceptual, procedural

sampling feature identification-- e.g. identify households/individuals to correlate environmental and clinical data; aggregated data, intentionally fuzzy data (fossil locations)

hand function reported at various times of the day: capture context of response 

how to deal with Sentinel (a la DDI) values



questions:

find surveys that ask about party affiliation

find surveys that have asked questions like X (question reuse)

find surveys that  have response ranges like Y

how many people came into the hospital with respiratory infections

find surveys with similar response populations



Solutions:

index questions

controlled vocabulary to classify question topics

include response value domain in metadata


viewpoints (DDI).  

unit data records, aggregate records, object oriented/sparse data, network data structures

data model for data objects.  Data object can by type for a variable.

Data objects are inputs to processes


integrating on sampling feature (same person, same house, same rock sample)

Integrate on property type

instance variable relation  ISO11404-- adjacency list; implement factory of data types


High level metadata:

Concepts for ddi:ConceptualVariable and ddi:UnitType (ssn:Property, ssn:FeatureOfInterest)

Capture information (URI for om:Procedure); ramify to instruments, protocols, sensors at property/variable/attribute level.

what else about variables?


Comparison with the DXWG dataset extension (to DCAT) proposal

https://github.com/w3c/dxwg/wiki/Data-aspects-semantics
empo:Dataset - metadata record
ssn-ext:ObservationCollection - ddi:DataCube
sosa:ObservableProperty - Concept or ConceptualVariable
sosa:FeatureOfInterest - ddi:UnitType or ddi:Population,  will vary depending on sampling strategy.
sosa:UltimateFeatureOfInterest - ddi:Universe
sosa:SampledFeatureType - ddi:UnitType

Leave DataCapture/Procedure for now, other than to note:
Platform/Procedure/Sensor - ddi:DataCapture

and that having some indication of the observation procedure/DataCapture approach in the high level metadata will be important for assessment of fitness.


What would you like to incorporate into Variable information?
(Note - need to look at Larry's work on harmonising with CSV on the Web Variables)
From DDI InstanceVariable:

- unitOfMeasurement (but preferably more tightly typed)
- Universe and/Or Population (UltimateFeatureOfInterest/FeatureOfInterest)
- UnitType (individual sample, SampledFeature)

- do the Units have identifiers? (to enable joining on unit with other data)
- SubstantiveValueDomain (or maybe ConceptualDomain)
- SentinelValueDomain?????
- Capture (although may need to be abstracted)

Need to distinguish these for use - either as Discovery or Assessability

Spatial information:
- "StructureGeographyView" includes relevant study level metadata
- Could we do this at a variable level?

- What is the spatial resolution of the unit/sampled feature location (need for space-based integration)

Temporal information
- "TimeMethod" as a study level (with a CV - linked to a Profile??)
- Need information at the variable level

- What is the temporal resolution of the observation time (need for time-based data integration)

Possible alignment of requirements with DDI Variable Cascade:

RequirementLevel
DiscoverabilityConceptual
AssessabilityRepresented
InteroperabilityInstance


See also: