Workshop summary - FAIR vocabularies

Workshop Report: “Interoperability for Cross-Domain Research: FAIR Vocabularies”

Virtual, 27 September - 1 October, 2021, Australian time-zones

A workshop on the FAIR publication of technical vocabularies was held as a virtual event, with two daily sessions over four and a half days. The event was sponsored by CODATA (the Committee on Data of the International Science Council), and the Data Documentation Initiative Alliance (DDI), and supported by Australian Research Data Commons (ARDC) and Dept. Agriculture, Water and Environment (DAWE). It was organized by Rowan Brownlee (ARDC), Simon Cox (CSIRO), Kheeran Dharmawardena (DAWE), Steve McEachern (Australian National University), and Lesley Wyborn (Australian National University). 

Controlled vocabularies are a key element of the semantic stack. The use of shared terminology is essential in allowing data to be compared or combined. Hence the provision of vocabularies in a FAIR form, so that they can be used and reused within and across applications and domains is an important concern in building interoperable data systems. 

The workshop brought together 25 participants from across Australia, representing a variety of disciplines and organizations across applications in geospatial data, earth and environmental science, official statistics, humanities and social science, health, and indigenous data, alongside a group of technology experts. Workshop activities were based around developing and testing practices for preparation, publication and maintenance of FAIR vocabularies. 

Participants were requested to bring examples of candidate ‘non-FAIR’ vocabularies. In the first activity (two days), four of these were selected for examination in small breakout teams, using the framework of the https://doi.org/10.1371/journal.pcbi.1009041. The vocabularies selected by the participants were: colour names (from the Munsell colour system), dwelling types (from AS 4590), historical police districts and administrative areas, and a small ecology classification (GBIF Establishment Means).

Few vocabularies remain fixed forever following their creation and publication. So the second exercise (two days) was to consider requirements and practices for maintenance, revision and versioning of vocabularies, based on studying the same example vocabularies. As a result, a template of Concerns in governing and managing a FAIR vocabulary has been generated, consisting of six broad concerns – 1. Scope and context; 2. Stakeholders; 3. Content management; 4. Revision and change requests; 5. Implementation and communication of changes; 6. Persistence and sustainability. Each of these are  subdivided into more specific details. 

Finally, a team of participants with experience developing and configuring technical platforms used in maintaining and publishing vocabularies compiled a list of FAIR vocabulary tools, tabulating a summary of their functions, and relative strengths and weaknesses (half day). 

A series of short presentations, focusing on content examples, vocabulary usage concerns, and FAIR technical platforms, introduced and framed the working sessions. These presentations triggered extensive discussion. The workshop agenda was kept flexible and was continuously adjusted to take maximum advantage of the engagement of the participants. 

More details of the participants and agenda, along with summaries of the main outcomes of the workshop are available from the Cross-Domain Research FAIR Vocabularies workshop website.