Controlled Vocabularies

Controlled Vocabularies

Notes on Controlled Vocabularies

2014-10-23 Joachim Wackerow (meeting  with Knut Wenzig, Justin Lynch, Sanda Ionescu)

Current Approach of DDI Controlled Vocabularies

  • Value of the Code

  • Descriptive Term of the Code

  • Definition of the Code

Hierarchy is expressed by a separator (“.”) in the code. Leftmost string is the highest level.

Both, term and definition can be multi-lingual.

The American English expression is the canonical version.

Current CVs in DDI 3

  • External controlled vocabularies

  • CodeList/CategoryScheme with nested codes for expressing hierarchy

  • Enumerated lists in XML Schema

Multiple Purposes

  • Code lists

  • Classification

  • Controlled vocabularies

Requirements

  • General

    • Ideally only one type of structure for multiple purposes

    • Usage of existing structures preferred if possible (like SKOS in the Semantic Web)

    • Same structure for internal representation in the model/specification and external representation. Reasoning: easy processing, same software solution, only different reference.

    • Validation of structure, keys, and possibly values

      • What should be validated: keys, values, relationship, dependency, requirement, …

      • XML Schema: what should be validated by XML parser, what in a secondary-level validation

  • Simple approach

    • Hierarchy

    • Multi-lingual text for term and definition

    • As simple as possible, easy to process

  • Complex

    • Requirement of items

    • Relationship of items

      • use case thesaurus

      • CV for keys, CV for values of key/value pair

Conclusion

Current sense: Two different approaches required: a simple CV structure similar to the approach for the current DDI Controlled Vocabularies and a more complex approach for additional requirements like validation and defining requirements.

Ideally the complex approach could make use of the simple approach.