Controlled Vocabularies
Notes on Controlled Vocabularies
2014-10-23 Joachim Wackerow (meeting with Knut Wenzig, Justin Lynch, Sanda Ionescu)
Current Approach of DDI Controlled Vocabularies
- Value of the Code
- Descriptive Term of the Code
- Definition of the Code
Hierarchy is expressed by a separator (“.”) in the code. Leftmost string is the highest level.
Both, term and definition can be multi-lingual.
The American English expression is the canonical version.
Current CVs in DDI 3
- External controlled vocabularies
- CodeList/CategoryScheme with nested codes for expressing hierarchy
- Enumerated lists in XML Schema
Multiple Purposes
- Code lists
- Classification
- Controlled vocabularies
Requirements
- General
- Ideally only one type of structure for multiple purposes
- Usage of existing structures preferred if possible (like SKOS in the Semantic Web)
- Same structure for internal representation in the model/specification and external representation. Reasoning: easy processing, same software solution, only different reference.
- Validation of structure, keys, and possibly values
- What should be validated: keys, values, relationship, dependency, requirement, …
- XML Schema: what should be validated by XML parser, what in a secondary-level validation
- Simple approach
- Hierarchy
- Multi-lingual text for term and definition
- As simple as possible, easy to process
- Complex
- Requirement of items
- Relationship of items
- use case thesaurus
- CV for keys, CV for values of key/value pair
Conclusion
Current sense: Two different approaches required: a simple CV structure similar to the approach for the current DDI Controlled Vocabularies and a more complex approach for additional requirements like validation and defining requirements.
Ideally the complex approach could make use of the simple approach.