Report: FAIRification Case Studies

Compiled by Simon J D Cox (CSIRO) simon.cox@csiro.au
from materials generated in the workshop by the contributors listed in each table.

Introduction

A recent paper by Cox et al. [2021] Ten simple rules for making a vocabulary FAIR outlined a basic recipe for taking an existing vocabulary and making it FAIR (Findable Accessible Interoperable and Reusable). The rules in that paper were based on experiences with a number of legacy vocabularies.

The initial activity in the FAIR Vocabularies workshop was to look at some additional un-FAIR vocabularies submitted by the participants, and explore how the Ten Simple Rules might be applied. The results of this activity are captured in the following grids.

1. Colour Names

Name of vocabulary

Colour names

Scope (subject) of vocabulary

Colour names from the Munsell / ISCC-NBS Colour System (with equivalencies / translations into other colour spaces)

Vocabulary source

Book: Color : universal language and dictionary of names / Kenneth L. Kelly and Deane B. Judd

Available from the US government:

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nbsspecialpublication440.pdf

Other copies (we looked at several to see what we could learn about copyright)

https://ia801701.us.archive.org/9/items/coloruniversalla00kell/coloruniversalla00kell.pdf

https://archive.org/details/coloruniversalla00kell/page/n13/mode/2up

https://babel.hathitrust.org/cgi/pt?id=uc1.b4253551&view=1up&seq=5

https://www.google.com.au/books/edition/_/iskurh-fG7IC?hl=en&gbpv=0

https://www.google.com.au/books/edition/_/0nbW37ScDAoC?hl=en&sa=X&ved=2ahUKEwi07dOfvp7zAhX73jgGHcfiBVgQ8fIDegQICBAE

Article: ‘sRGB Centroids for the ISCC-NBS Colour System’ (for conversion to sRGB)

https://www.munsellcolourscienceforpainters.com/ColourSciencePapers/sRGBCentroidsForTheISCCNBSColourSystem.pdf

The ISCC-NBS Colour System by Paul Centore

https://www.munsellcolourscienceforpainters.com/ISCCNBS/ISCCNB

SSystem.html

Munsell Renotation Data

https://www.rit.edu/science/munsell-color-science-lab-educational-resources#munsell-renotation-data

Munsell and Kubelka Munk toolbox (Octave/MatLab scripts)

https://www.munsellcolourscienceforpainters.com/MunsellAndKubelkaMunkToolbox/MunsellAndKubelkaMunkToolbox.html

Munsell resources

https://www.munsellcolourscienceforpainters.com/MunsellResources/MunsellResources.html

Convert Munsell to hexidecimal (MIT licensed R package)

https://rdrr.io/cran/munsell/

‘An ISCC-NBS Colour List for 'roloc' (roloc is an R colour naming package). https://www.stat.auckland.ac.nz/~paul/Reports/roloc/NBS/roloc-nbs.html

*All Munsell colour designations translated to ISCC-NBS colours can be found in ISCCNBSDesignators.txt (uploaded; see: https://www.munsellcolourscienceforpainters.com/ColourSciencePapers/ColourSciencePapers.html, data files for ‘sRGB Centroids for the ISCC-NBS Colour System’)

BGS (excel sheet): https://webapps.bgs.ac.uk/data/vocabularies/viewdata.cfm?name=DIC_MUNSELL_COLOUR

NOAA PaST Thesaurous: https://www.ncdc.noaa.gov/paleo-search/cvterms?termId=3451

Who contributed to this case-study?

Shawn Ross,

Mark Lindsay,

Megan Wong,

Anusuriya Devaraju,

Rowan Brownlee

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

None, from a 1976 book - approach the Munsell Color Lab at RIT: https://www.rit.edu/science/munsell-color-lab - scheduling meeting now

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?
e.g. Creative Commons, Licentia

HathiTrust has digitised Color : universal language and dictionary of names. Rights for 1976 specified as public domain ‘digitised by Google’ (use of the OCR images themselves is restricted, but ‘’There are no restrictions on use of text transcribed from the images, or paraphrased or translated using the images’.). Full catalogue record here. Reuse conditions specified here. Need to confirm international public domain including ability to produce derivative works. We also need to check the ability to cross-reference / convert commercial systems like Munsell and Pantone.

CC-0 - deliver attribution in payload

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

 

Yes - each level defines a 3D volume in the Munsell colour space (translation to other colour spaces may be necessary)

Yes - at all levels, the definitions are complete (non-overlapping 3D volume in the Munsell colour space)

Yes - subsets across six levels, no overlap, no gaps

Yes - to other colour spaces

Some - Pantone is a (commercial) vocab, others are values in other colour spaces (e.g., sRGB, various CIE systems (RGB, LAB, XYZ, xyY), CMYK, hexadecimal, Pantone, RAL, CIELUV - do we provide centroid only, or define limits of each 3D volume.

See table below for summary of vocabulary structure:

 

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

Initial work can be done in a GitHub repository I (SAR) created:

https://github.com/saross/colour-names/

Point of truth will be reflection of vocabulary sources. If relationships between vocabularies are created and need to be governed - will need process/governance clarified

(we can clone this repository to a organisational GitHub - or someplace else - later)

Maintenance history is visible (make good commit comments!).

5. Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

Design a pattern(s) to be used for each term URI

Vocab will be submitted to Research Vocabularies Australia - what domain? (Discuss with RIT?)

Opaque UUIDs / GUIDs as basis of URI?

Persistence scheme - w3id, doi, handle, ..?

 

 

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms

Richer semantics? Can you create a OWL Ontology or RDF Shapes description?

Model: SKOS or OWL?

Likely SKOS, and OWL as required - to satisfy user needs

 

7. Complete the vocabulary metadata:
some options: DCAT, LOV metadata guidelines, OMV, MOD

Metadata properties :

Colour names (levels) definitions

  • Hue

  • Value

  • Chroma

  • CIELAB colour cross-reference (centroid of Munsell)

  • scopeNote (maybe)

  • Administrative metadata? (maintainer, etc.)

Some to consider, eg -

  • dct:accrualPeriodicity

  • dcterms:contributor

  • dcterms:creator

  • dct:identifier

  • dct:modified

  • dcterms:description

  • dct:title

  • dcterms:license

  • dcterms:publisher

  • dcterms:rights

  • dcterms:source

  • rdfs:label

  • rdfs:seeAlso

8. Where will you register the vocabulary?
Some options: RVA, LOV, BioPortal, Agroportal, BARTOC, OBO, CESSDA

RVA? Registered and accessible at multiple service / dissemination points?

Make discoverable in multiple places - where identified communities and members likely to use the resource may look

9. How will you make the vocabulary accessible?

E.g., hosting via RVA, organized into collections.

Identifying the user groups, custodian - their roles and responsibilities - will help determine required HTTP configuration for the vocabulary domain (rule 5)

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Potential maintainers : ANZSoil, Geoscience Australia in partnership with e.g., TERN, Australian Archaeologic Society

Enquire with RIT Munsell Colour Laboratory (if they want a role or can contribute resources)

Some sort of institution that can manage the vocab over the long term.

Engagement through AU Vocab Interest Group

Maintenance and governance - evaluated to suit need of custodian and users

Vocabulary structure (question 3)

Level 1 (Munsell / ISCC-NBS)

Level 2

(Munsell / ISCC-NBS)

Level 3

(Munsell / ISCC-NBS)

Level 4

(Munsell / ISCC-NBS)

Colour space / translation

4 | brown

9 | yellowish brown

76 | light yellowish brown

Munsell 10YR 6/4

sRGB

CIELAB

CIExyY

CIERGB

CIEXYZ

CMYK

RAL (DE)

PCCS (JP)

Hex

Pantone

??

Centroid?

CIELUV

[colour]

[colour]

 

 

 

 

[colour]

 

 

 

 

 

Geological example using Pantone (from https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/21883) - link to standard colours archive file at right on that page - contains an xls for mapping (Pantone and RGB)

Other schemas

Munsell colour palette: http://pteromys.melonisland.net/munsell/

https://stackoverflow.com/questions/3620663/color-theory-how-to-convert-munsell-hvc-to-rgb-hsb-hsl

http://purl.obolibrary.org/obo/NCIT_C37927

Communities - potential users

Needs of soil community

Use mostly munsell or undefined schema colour names. As well as RGB and munsell and conversion between, some in soil community may use: https://www.worldcat.org/title/revised-standard-soil-color-charts/oclc/255574307?referer=di&ht=edition as in National standard yellow book http://anzsoil.org/def/au/asls/soil-profile/colour

And see http://anzsoil.org/def/au/asls/soil-profile/colour-stat

And see p 149 https://www.publish.csiro.au/ebook/download/pdf/8016

US use Pantone

Geology

https://github.com/CSIRO-enviro-informatics/interactive-geological-timescale

Colour mapping (RBG and CMYK) https://stratigraphy.org/ICSchart/CGMW_ICS_colour_codes.xlsx

Making map representations accessible to the colour blind (e.g. avoid the colour combinations shown in the Ishihara test plates)

Ishihara Test for Color Blindness: https://www.colour-blindness.com/colour-blindness-tests/ishihara-colour-test-plates/

Farnsworth D-15 Dichotomous Color Blindness Test: https://www.color-blindness.com/color-arrangement-test/

USGS interesting mix of CMYK and qualitative descriptions https://pubs.usgs.gov/tm/2005/11B01/05tm11b01.html#heading155113810

 

Soils community incl industry, organisations, states territories, ANZIS https://github.com/ANZSoilData - use Munsell (as x 3 separate concepts, but more likely store results in DBases as concatenated)

Geoscience Australia - Irina - rock colour

On discussion list - AVSIG, AGLDWG National, Research Data Alliance

SKOS provides a ‘collection’ - a subset for a domain, e.g., soil colours?

To do

  1. Interest/Governance follow up (Shawn)

  2. Touch base with some main user groups to see what they use/what content they would like/resolution required/how they would like the information represented - Soils Peter W/Andrew Biggs (NCST) (Megan), GA Irina? (Mark - do you want to follow up? [mdl] Yep - I’ll do that and also ask CGI )

  3. Make content as spreadsheet - at least most basic groupings (Shawn)

  4. Convert to SKOS (Anu)

  5. Trial on RVA test infrastructure (Rowan)

Questions for RIT:

Why are two colours missing from L2 when compared to ISCC-NBS (page 4)? (brownish orange; brownish pink) - Ok, the page A-7, Table 1 list is what is captured here - while Page 4 table 1 is ‘Hue names’ - which overlap but are different?

Why are the colours ordered differently from the ISCC-NBS system? Does it mean something?

To derive:

(Color: universal language and dictionary of names, Table 1, page A7)

Level 4 = 1 Hue step, 1 Value step, and 2 Chroma steps

Level 5 = 0.5 Hue step, 0.1 Value step, and 0.25 Chroma step

Level 6 = 0.1 Hue step, 0.05 Value step, and 0.1 Chroma step

How do we ‘trim’ the results? With the ‘real’ values from the 1943 renotaton? Or just use the renotation data and interpolate where needed? Hue is really the only problem

Plus: ensure that all colours in the RIT resource and in the XRite resource are represented. Does that mean we just do all 0.25 hue steps?.

Collapse levels 5-6 and just express as a decimal? Or keep Level 5 (generate it programmatically?), and Level 6 becomes

2. AS 4590 - dwelling and address types

 

Name of vocabulary

AS 4590 - Address codes - “sub-dwelling unit-type”

 

Scope (subject) of vocabulary

Dwelling and address types

Who uses this vocabulary and what for?

Many, though some of them don’t actually know that are

Vocabulary source

AS 4590:2017 (same as AS 4590:2006) - csv file extracted here https://drive.google.com/drive/u/0/folders/1pVZCBhlKw-ivzURn6XCUvVq3fijXHOHq

 

Similar to G-NAF flat-types (AS 4819) (possibly source)

 

Compare with ABS

“Functional Classification of Buildings”

Also Australian building-code

Who contributed to this case-study?

Nick Car

Irina Bastrakova

Jenny Long

Edmond Chuc (day 1)

Simon Cox

Michael Biddington

Armin Haller (day 2)

Jason Atkinson (day 2)

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

Standards Australia IT-027 (“Data management and interchange”) but IT-004 also has some role (listed by Standards Australia as the “publisher”).

 

Shadows ISO/IEC JTC1/SC 32
- is this vocabulary actually maintained by ISO/IEC?

 

Also see ABS and Australian building regs which have related codelists designed for different needs.

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?

This content appears to be a subset of the ABS

“Functional Classification of Buildings” - maybe have the ABS vocab as the FAIRification target here, and then provide an interface to the 4590 subset for those users who need it.

 

Alert the owners that we want to FAIRify, but move ahead anyway.

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

Yes, no (defn). Don’t know where from (common terms)

N/A

No - flat list

No

No, but it seems to either be derived from AS 4419 “Flat Types” (https://gnafld.net/def/gnaf/code/FlatTypes) or vice versa

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

 

 

The standards document, we presume

No

5. Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

Design a pattern(s) to be used for each term URI

https://linked.data.gov.au/def/?? - assumed

Would be easy for this flat list if we can establish a vocab ID. If that was as4590-sdut (Sub-dwelling Unit Type), given we have codes, we might have for Studio: https://linked.data.gov.au/def/as4590-sdut/stu

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms

Richer semantics? Can you create a OWL Ontology or RDF Shapes description?

 

Easy peasy

No: no explicit relations known

7. Complete the vocabulary metadata:

 

8. Where will you register the vocabulary?
Some options: RVA, LOV, BioPortal, Agroportal, BARTOC, OBO, CESSDA

RVA - this is an Australian govt thing

9. How will you make the vocabulary accessible?
(i.e. make vocab and term URIs resolve)

Via Aust Gov Linked Data WG URIs to RVA (likely)

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Likely by the standards maintainers - govt IT-027 committee - but this is not a task they have agreed to yet.

3. Police Districts

Name of vocabulary

Historical Police Districts

Scope (subject) of vocabulary

List of names of police districts from 1921-1950

Vocabulary source

Various: Book, spreadsheet, government report, etc. derived from past work on the census papers now in the archives.

Who contributed to this case-study?

Len Smith

Sandra Silcot

Susan Birtles

Steven McEachern (scribe)

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

Various state archives (original lists)

Not yet existing (for the “shared” vocabulary)

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?

None - doesn’t exist officially yet, but there may be a license on the documents

See above

Depends on a and b

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

 

In part. Police district names for each state/year with links to NAA images are mostly known. Those unlinked require checking to see if the district was renamed/merged/removed. Definitions of geographical boundaries represented lie in the archives need transcription - this may come later. As a first step it is planned to match to current police stations (giving lat/lon) as a proxy.

Yes.

Under development.

Yes.

Yes – to ‘place types’ vocab and to (non-vacab form) of police stations in State repositories.

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

4. The national archive - this is an historical record - and as such will be static except for when additional historical sources come to hand or corrections/extensions are made.

Maintenance history mechanism to be determined - more likely to publish new editions from time to time.

5. Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

Design a pattern(s) to be used for each term URI

5. Under development. Have obtained a NAAN from n2t.net which will enable experimentation prior to publication. Need to find a long term home for the published knowledge graph and vocab arising from it. RVA for the vocab most likely.

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms

Richer semantics? Can you create an OWL Ontology or RDF Shapes description?

B. Looking to construct an OWL ontology(ies) covering 3 aspects:

Historical reality

The traces of that reality revealed in the archive

Representations of research arising from the archive and other knowledge (Len Smith’s monograph) – tables/

 

7. Complete the vocabulary metadata:

Plan for extensive description within the ontologies and vocabs with dublin core terms as appropriate. If not adequate, supplement with DCAT.

8. Where will you register the vocabulary?

In RVA.

9. How will you make the vocabulary accessible?

RVA for the vocab. We also have an experimental website at rdxx.org which via ARK system (see n2t.net) will be able to deliver linked data from the research database/graph experimentally (jena/fusueki planned) using PIDS.

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Via collaboration and consultation with ISCM in Australia will seek to use this application as a “demonstrator” for adding temporal change to place types and place name instances - possibly contributing to a historical version of the placenames gazette.

Background:

https://rdxx.org/case-studies/dagstuhl-aic-policedistricts/

Aims:

  1. Authoritative list of the police districts of Australia for the period 1921-1950

  2. How they were used in the Aboriginal Censuses from that period (which were counted by police district)

  3. To infill those areas that were not included in the count of a given year

  4. To align those areas with other names for the places that are being described

Source information:

For example. Police annual reports:

https://media.opengov.nsw.gov.au/pairtree_root/24/66/7f/b1/ef/d5/98/38/82/a4/96/65/90/96/f5/e8/obj/document.pdf

Comments:

The 10 steps presume the existence of an UNFAIR vocabulary.

What would the first three steps be in the absence of the existing vocabulary?

Steps 1-3 probably have an A and a B version depending on the existence of the vocabulary.

  1. Create the vocabulary

  2. Establish a suitable license (including notions around re-use - including “you may not want to use this yet”)

  3. Establish the set of terms to be used

Len and Sandra tried to establish a current list - based on current police stations. But there isn’t an authoritative vewrsion for current day either. (Depended on the state - Qld had a CSV, NSW had an ARCGIS client, but not extract, Vic was accessible through a DBF file).

Note that the primary interest here is something different - reconstructing the past. Here is the set of “historical police district names”. Point of authority is the censuses of the period in time. Make it distinct from the set of mission names. Would ideally tie it into records in the historical sources, such as National Archives records. (QUESTION: is this a PLACE NAME or GEOGRAPHIC AREA.)

Include a placeholder for the future “anticipated” set of historical police districts. (Noting that that doesn’t exist yet.).

Also try to establish relationship between past and present entities (e.g. POLICE DISTRICT - past and POLICE STATION - current)

Place types:

  • POLICE DISTRICTS

  • POLICE STATIONS

  • POLICE DIVISIONS

  • MISSIONS

Each of those things have a CURRENCY in NAME and LOCATION.

There is a parallel case in Susan’s work.

Qld DIsaster Management - Local, District and State levels. (Local is LOCAL COUNCIL, District is POLICE DISTRICT, State is STATE).

Escalation goes from Local > DIstrict > State.

There is alignment between historical and current types.

(There’s also a parallel in Politics (Steve - ANZLEAD) in ELECTORATES)

(Historical Census data also includes POLICE DISTRICTS, LOCAL GOVERNMENT AREAS and ...?)

4. Administrative areas

Name of vocabulary

Administrative areas

Scope (subject) of vocabulary

Government defined admin areas.

With an initial focus on those described as collection areas for historical census records.

Vocabulary source

Nil.

The areas themselves are expected to be defined and named under various legislative mandates by government bodies so individual terms and definitions will have sources, but a collective vocabulary will not.

Who contributed to this case-study?

Len Smith

Sandra Silcot

Susan Birtles

Steven McEachern (scribe)

(Using research on historical police districts as an example)

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

No known vocabulary in existence.

A custodian will need to be found, or custodianship accepted by those who need it.

(Probably has origins in a “community development” for now)

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?
e.g. Creative Commons, Licentia

 

n/a (but possibly leveraging source vocabularies - ASGS, ASGC, AEC, ...)

(https://en.wikipedia.org/wiki/Lands_administrative_divisions_of_Australia )

Is this an extension of the ICSM Place Types (https://vocabs.ardc.edu.au/viewById/260 )

n/a

CC?

NB - licencing and metadata might consider an indication that the vocab is project specific

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
I. Are there well defined subsets?
II. Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

 

Some source vocabularies do (e.g. ASGS, ICSM). Other sources would come from legislative sources

 

Yes for existing content.

 

Potential structure:

PLACE TYPE

SPATIAL CHARACTERISTICS (GEOMETRY?)

TEMPORAL CHARACTERISTICS

NAME

IDENTIFIER

CURRENCY

 

Currency vocabulary needed as well:

Current

Historical

Proposed

(Plus the TEMPORAL CHARACTERISTICS of the CURRENCY). Qld Gov: distinction between CURRENCY and APPROVED STATUS.

 

And e)

Cross-references are likely (to cross-ref content from across the multiple source vocabularies)

4. Point of truth:

 

5. Web identifiers:

 

6. Encoding:

 

7. Complete the vocabulary metadata:

 

8. Where will you register the vocabulary?

RVA most likely to begin with (as initially for research purposes)

9. How will you make the vocabulary accessible?

 

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Who is responsible for CREATION? COMMUNITY GROUP (HASS-I?)

Who is responsible for CONTENT maintenance? COMMUNITY GROUP (HASS-I?). Need to define a CUSTODIAN.

Who is responsible for SYSTEM maintenance? (ARDC - RVA)


5. The Establishment Means GBIF Taxon

 

Name of vocabulary

Establishment Means GBIF Vocabulary

Scope (subject) of vocabulary

We narrowed down to this vocabulary on “Establishment means” from this list GBIF https://rs.gbif.org/vocabulary/gbif/ We also noted that in this GBIF list there are other vocabularies that are available from multiple sources (e.g., Units of Measure, Reference, etc.)

Vocabulary source

https://rs.gbif.org/vocabulary/gbif/establishment_means.xml

Who contributed to this case-study?

Natalia Atkins,

Simon Wall,

Doug Palmer,

Lesley Wyborn,

Adrian Burton

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

GBIF - custodian

 

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?

 

Licencing is not explicitly stated for vocabs, but other GBIF information is mostly CC0.

Doug to confirm with GBIF

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

 

Yes - Adrian says they are pretty good compared with other vocabs. Doug says community accepted: Specifies level of hierarchy in definition.

 

Yes

 

The vocabulary does not have an explicit hierarchy, the hierarchy is text in the definitions

There are no internal cross references (these are in text inside the definition)

There no references to items in external vocabularies

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

 

Don’t know how the terms are generated or maintained. It has an issue date, but tells when converted into XML form: no version control. There is a GitHub: https://rs.gbif.org/vocabulary/gbif/

See also: https://github.com/gbif/gbif-api/blob/dev/src/main/java/org/gbif/api/vocabulary/EstablishmentMeans.java

No - vocabulary maintenance is not visible, neither is the maintenance history and there is no provenance trail. This is probably a result of migration from code-based enumeration to a more vocabulary-based approach. (The ALA has a similar problem where vocabulary terms are buried in text file resources in code.)

5, Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

{is there a} Design a pattern(s) to be used for each term URI

 

http://rs.gbif.org/vocabulary/gbif/establishment_means/ Yes

base/term

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms
(e.g using SKOS-Play! or PoolParty)

 

Richer semantics? Can you create a OWL Ontology or RDF Shapes description?

 

There is no SKOS, but could be turned into one

To have SKOS elements we would need to go into partnership with GBIF. GBIF have not yet published a SKOS instance yet, so we need a publisher to do that kind of work with so that we can validate the SKOS version.

From a practical perspective, GBIF can translate URIs into standardised terms - they do plan to create a SKOS version.

You could create an OWL Ontology and RDF

The OWL ontology could (and maybe should) create sub-classes /sub-properties of SKOS:Concept, SKOS: broader etc. that reflect the specific domain being modelled.

Some of the other vocabularies, such as ranks, need domain-specific attributes, eg. isLinnaean for ranks.

It would be helpful to associate icons with terms on a per-application basis (eg. taxonomic status of excluded shows a no-entry sign in the NSL, TK labels have graphics associated with them)

7. Complete the vocabulary metadata:
some options: DCAT (W3C Data Catalog Vocabulary), LOV metadata guidelines (Metadata for Linked Open Vocabularies), OMV (BioPortal), MOD (Metadata for AgroPortal

Currently do not use any of these metadata standards to Doug’s knowledge - they are not RDF. The XMLs use Dublin Core terms.

8. Where will you register the vocabulary?
Some options: RVA, LOV, BioPortal, Agroportal, BARTOC , OBO, CESSDA

Are already registered in RVA with pointers to GBIF (used by DAWE) See: https://vocabs.ardc.edu.au/search/#!/?activeTab=vocabularies&pp=15&q=GBIF but only raises awareness of the existence of the vocab.

ARDC has connections with Donald Hobern and Dave Martin: ARDC would become a secondary publication point.

GBIF have a performance related desire to run in house on a local server so they can crunch through billions of records to do data integration and aggregation that they do.

GBIF does need a public point of access.

GBIF does not have any connections with the other vocab registries listed.

9. How will you make the vocabulary accessible?
(i.e. make vocab and term URIs resolve)

They are accessible, but each term resolves back to a vocabulary hash term, rather than a vocabulary slash term. That is, you resolve to a fragment of a document, not to the exact term (is this acceptable?), e.g. https://rs.gbif.org/vocabulary/gbif/establishment_means/invasive maps onto https://rs.gbif.org/vocabulary/gbif/establishment_means.xml#invasive

For increased accessibility, each term needs its own page.

10. How will the FAIR vocabulary be maintained?

 

How will the custodians be involved?
How will the community be engaged and informed?

Need to engage with GBIF, need a governance document and guidelines like what Anu showed if it is to be maintainable. These documents need to be public!

Custodians can be involved by being maintainers

GBIF needs to open these up for the broader community. The assumption is that other people will reuse this, but initially they were developed for internal consumption. Need to get over a big hump from internal use to external access and work out how you will you use it, better documentation, formal change management procedures etc (things implicit in management and maintenance). It is in GBIFs interest to have community consensus around the terms, their definition and their upgrading.