Report: FAIRification Case Studies

Report: FAIRification Case Studies

Compiled by Simon J D Cox (CSIRO) simon.cox@csiro.au
from materials generated in the workshop by the contributors listed in each table.

Introduction

A recent paper by Cox et al. [2021] Ten simple rules for making a vocabulary FAIR outlined a basic recipe for taking an existing vocabulary and making it FAIR (Findable Accessible Interoperable and Reusable). The rules in that paper were based on experiences with a number of legacy vocabularies.

The initial activity in the FAIR Vocabularies workshop was to look at some additional un-FAIR vocabularies submitted by the participants, and explore how the Ten Simple Rules might be applied. The results of this activity are captured in the following grids.

1. Colour Names

Name of vocabulary

Colour names

Scope (subject) of vocabulary

Colour names from the Munsell / ISCC-NBS Colour System (with equivalencies / translations into other colour spaces)

Vocabulary source

Book: Color : universal language and dictionary of names / Kenneth L. Kelly and Deane B. Judd

Available from the US government:

http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nbsspecialpublication440.pdf

Other copies (we looked at several to see what we could learn about copyright)

https://ia801701.us.archive.org/9/items/coloruniversalla00kell/coloruniversalla00kell.pdf

https://archive.org/details/coloruniversalla00kell/page/n13/mode/2up

https://babel.hathitrust.org/cgi/pt?id=uc1.b4253551&view=1up&seq=5

https://www.google.com.au/books/edition/_/iskurh-fG7IC?hl=en&gbpv=0

https://www.google.com.au/books/edition/_/0nbW37ScDAoC?hl=en&sa=X&ved=2ahUKEwi07dOfvp7zAhX73jgGHcfiBVgQ8fIDegQICBAE

Article: ‘sRGB Centroids for the ISCC-NBS Colour System’ (for conversion to sRGB)

https://www.munsellcolourscienceforpainters.com/ColourSciencePapers/sRGBCentroidsForTheISCCNBSColourSystem.pdf

The ISCC-NBS Colour System by Paul Centore

https://www.munsellcolourscienceforpainters.com/ISCCNBS/ISCCNB

SSystem.html

Munsell Renotation Data

https://www.rit.edu/science/munsell-color-science-lab-educational-resources#munsell-renotation-data

Munsell and Kubelka Munk toolbox (Octave/MatLab scripts)

https://www.munsellcolourscienceforpainters.com/MunsellAndKubelkaMunkToolbox/MunsellAndKubelkaMunkToolbox.html

Munsell resources

https://www.munsellcolourscienceforpainters.com/MunsellResources/MunsellResources.html

Convert Munsell to hexidecimal (MIT licensed R package)

https://rdrr.io/cran/munsell/

‘An ISCC-NBS Colour List for 'roloc' (roloc is an R colour naming package). https://www.stat.auckland.ac.nz/~paul/Reports/roloc/NBS/roloc-nbs.html

*All Munsell colour designations translated to ISCC-NBS colours can be found in ISCCNBSDesignators.txt (uploaded; see: https://www.munsellcolourscienceforpainters.com/ColourSciencePapers/ColourSciencePapers.html, data files for ‘sRGB Centroids for the ISCC-NBS Colour System’)

BGS (excel sheet): https://webapps.bgs.ac.uk/data/vocabularies/viewdata.cfm?name=DIC_MUNSELL_COLOUR

NOAA PaST Thesaurous: https://www.ncdc.noaa.gov/paleo-search/cvterms?termId=3451

Who contributed to this case-study?

Shawn Ross,

Mark Lindsay,

Megan Wong,

Anusuriya Devaraju,

Rowan Brownlee

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

None, from a 1976 book - approach the Munsell Color Lab at RIT: https://www.rit.edu/science/munsell-color-lab - scheduling meeting now

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?
e.g. Creative Commons, Licentia

HathiTrust has digitised Color : universal language and dictionary of names. Rights for 1976 specified as public domain ‘digitised by Google’ (use of the OCR images themselves is restricted, but ‘’There are no restrictions on use of text transcribed from the images, or paraphrased or translated using the images’.). Full catalogue record here. Reuse conditions specified here. Need to confirm international public domain including ability to produce derivative works. We also need to check the ability to cross-reference / convert commercial systems like Munsell and Pantone.

CC-0 - deliver attribution in payload

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

 

Yes - each level defines a 3D volume in the Munsell colour space (translation to other colour spaces may be necessary)

Yes - at all levels, the definitions are complete (non-overlapping 3D volume in the Munsell colour space)

Yes - subsets across six levels, no overlap, no gaps

Yes - to other colour spaces

Some - Pantone is a (commercial) vocab, others are values in other colour spaces (e.g., sRGB, various CIE systems (RGB, LAB, XYZ, xyY), CMYK, hexadecimal, Pantone, RAL, CIELUV - do we provide centroid only, or define limits of each 3D volume.

See table below for summary of vocabulary structure:

 

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

Initial work can be done in a GitHub repository I (SAR) created:

https://github.com/saross/colour-names/

Point of truth will be reflection of vocabulary sources. If relationships between vocabularies are created and need to be governed - will need process/governance clarified

(we can clone this repository to a organisational GitHub - or someplace else - later)

Maintenance history is visible (make good commit comments!).

5. Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

Design a pattern(s) to be used for each term URI

Vocab will be submitted to Research Vocabularies Australia - what domain? (Discuss with RIT?)

Opaque UUIDs / GUIDs as basis of URI?

Persistence scheme - w3id, doi, handle, ..?

 

 

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms

Richer semantics? Can you create a OWL Ontology or RDF Shapes description?

Model: SKOS or OWL?

Likely SKOS, and OWL as required - to satisfy user needs

 

7. Complete the vocabulary metadata:
some options: DCAT, LOV metadata guidelines, OMV, MOD

Metadata properties :

Colour names (levels) definitions

  • Hue

  • Value

  • Chroma

  • CIELAB colour cross-reference (centroid of Munsell)

  • scopeNote (maybe)

  • Administrative metadata? (maintainer, etc.)

Some to consider, eg -

  • dct:accrualPeriodicity

  • dcterms:contributor

  • dcterms:creator

  • dct:identifier

  • dct:modified

  • dcterms:description

  • dct:title

  • dcterms:license

  • dcterms:publisher

  • dcterms:rights

  • dcterms:source

  • rdfs:label

  • rdfs:seeAlso

8. Where will you register the vocabulary?
Some options: RVA, LOV, BioPortal, Agroportal, BARTOC, OBO, CESSDA

RVA? Registered and accessible at multiple service / dissemination points?

Make discoverable in multiple places - where identified communities and members likely to use the resource may look

9. How will you make the vocabulary accessible?

E.g., hosting via RVA, organized into collections.

Identifying the user groups, custodian - their roles and responsibilities - will help determine required HTTP configuration for the vocabulary domain (rule 5)

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Potential maintainers : ANZSoil, Geoscience Australia in partnership with e.g., TERN, Australian Archaeologic Society

Enquire with RIT Munsell Colour Laboratory (if they want a role or can contribute resources)

Some sort of institution that can manage the vocab over the long term.

Engagement through AU Vocab Interest Group

Maintenance and governance - evaluated to suit need of custodian and users

Vocabulary structure (question 3)

Level 1 (Munsell / ISCC-NBS)

Level 2

(Munsell / ISCC-NBS)

Level 3

(Munsell / ISCC-NBS)

Level 4

(Munsell / ISCC-NBS)

Colour space / translation

4 | brown

9 | yellowish brown

76 | light yellowish brown

Munsell 10YR 6/4

sRGB

CIELAB

CIExyY

CIERGB

CIEXYZ

CMYK

RAL (DE)

PCCS (JP)

Hex

Pantone

??

Centroid?

CIELUV

[colour]

[colour]

 

 

 

 

[colour]

 

 

 

 

 

Geological example using Pantone (from https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/21883) - link to standard colours archive file at right on that page - contains an xls for mapping (Pantone and RGB)

Other schemas

Munsell colour palette: http://pteromys.melonisland.net/munsell/

https://stackoverflow.com/questions/3620663/color-theory-how-to-convert-munsell-hvc-to-rgb-hsb-hsl

http://purl.obolibrary.org/obo/NCIT_C37927

Communities - potential users

Needs of soil community

Use mostly munsell or undefined schema colour names. As well as RGB and munsell and conversion between, some in soil community may use: https://www.worldcat.org/title/revised-standard-soil-color-charts/oclc/255574307?referer=di&ht=edition as in National standard yellow book http://anzsoil.org/def/au/asls/soil-profile/colour

And see http://anzsoil.org/def/au/asls/soil-profile/colour-stat

And see p 149 https://www.publish.csiro.au/ebook/download/pdf/8016

US use Pantone

Geology

https://github.com/CSIRO-enviro-informatics/interactive-geological-timescale

Colour mapping (RBG and CMYK) https://stratigraphy.org/ICSchart/CGMW_ICS_colour_codes.xlsx

Making map representations accessible to the colour blind (e.g. avoid the colour combinations shown in the Ishihara test plates)

Ishihara Test for Color Blindness: https://www.colour-blindness.com/colour-blindness-tests/ishihara-colour-test-plates/

Farnsworth D-15 Dichotomous Color Blindness Test: https://www.color-blindness.com/color-arrangement-test/

USGS interesting mix of CMYK and qualitative descriptions https://pubs.usgs.gov/tm/2005/11B01/05tm11b01.html#heading155113810

 

Soils community incl industry, organisations, states territories, ANZIS https://github.com/ANZSoilData - use Munsell (as x 3 separate concepts, but more likely store results in DBases as concatenated)

Geoscience Australia - Irina - rock colour

On discussion list - AVSIG, AGLDWG National, Research Data Alliance

SKOS provides a ‘collection’ - a subset for a domain, e.g., soil colours?

To do

  1. Interest/Governance follow up (Shawn)

  2. Touch base with some main user groups to see what they use/what content they would like/resolution required/how they would like the information represented - Soils Peter W/Andrew Biggs (NCST) (Megan), GA Irina? (Mark - do you want to follow up? [mdl] Yep - I’ll do that and also ask CGI )

  3. Make content as spreadsheet - at least most basic groupings (Shawn)

  4. Convert to SKOS (Anu)

  5. Trial on RVA test infrastructure (Rowan)

Questions for RIT:

Why are two colours missing from L2 when compared to ISCC-NBS (page 4)? (brownish orange; brownish pink) - Ok, the page A-7, Table 1 list is what is captured here - while Page 4 table 1 is ‘Hue names’ - which overlap but are different?

Why are the colours ordered differently from the ISCC-NBS system? Does it mean something?

To derive:

(Color: universal language and dictionary of names, Table 1, page A7)

Level 4 = 1 Hue step, 1 Value step, and 2 Chroma steps

Level 5 = 0.5 Hue step, 0.1 Value step, and 0.25 Chroma step

Level 6 = 0.1 Hue step, 0.05 Value step, and 0.1 Chroma step

How do we ‘trim’ the results? With the ‘real’ values from the 1943 renotaton? Or just use the renotation data and interpolate where needed? Hue is really the only problem

Plus: ensure that all colours in the RIT resource and in the XRite resource are represented. Does that mean we just do all 0.25 hue steps?.

Collapse levels 5-6 and just express as a decimal? Or keep Level 5 (generate it programmatically?), and Level 6 becomes

2. AS 4590 - dwelling and address types

 

Name of vocabulary

AS 4590 - Address codes - “sub-dwelling unit-type”

 

Scope (subject) of vocabulary

Dwelling and address types

Who uses this vocabulary and what for?

Many, though some of them don’t actually know that are

Vocabulary source

AS 4590:2017 (same as AS 4590:2006) - csv file extracted here https://drive.google.com/drive/u/0/folders/1pVZCBhlKw-ivzURn6XCUvVq3fijXHOHq

 

Similar to G-NAF flat-types (AS 4819) (possibly source)

 

Compare with ABS

“Functional Classification of Buildings”

Also Australian building-code

Who contributed to this case-study?

Nick Car

Irina Bastrakova

Jenny Long

Edmond Chuc (day 1)

Simon Cox

Michael Biddington

Armin Haller (day 2)

Jason Atkinson (day 2)

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

Standards Australia IT-027 (“Data management and interchange”) but IT-004 also has some role (listed by Standards Australia as the “publisher”).

 

Shadows ISO/IEC JTC1/SC 32
- is this vocabulary actually maintained by ISO/IEC?

 

Also see ABS and Australian building regs which have related codelists designed for different needs.

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?

This content appears to be a subset of the ABS

“Functional Classification of Buildings” - maybe have the ABS vocab as the FAIRification target here, and then provide an interface to the 4590 subset for those users who need it.

 

Alert the owners that we want to FAIRify, but move ahead anyway.

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?

Are there cross-references to items in other vocabularies?

Yes, no (defn). Don’t know where from (common terms)

N/A

No - flat list

No

No, but it seems to either be derived from AS 4419 “Flat Types” (https://gnafld.net/def/gnaf/code/FlatTypes) or vice versa

4. Point of truth:

What is the maintenance environment for the vocabulary content?
(e.g. a file in a version control repository)

Is the vocabulary maintenance history externally visible?

 

 

The standards document, we presume

No

5. Web identifiers:

What is the base URI for the vocabulary? Can it be managed over the long term?

Design a pattern(s) to be used for each term URI

https://linked.data.gov.au/def/?? - assumed

Would be easy for this flat list if we can establish a vocab ID. If that was as4590-sdut (Sub-dwelling Unit Type), given we have codes, we might have for Studio: https://linked.data.gov.au/def/as4590-sdut/stu

6. Encoding:

Basic hierarchical vocabulary: implement a SKOS representations of the terms

Richer semantics? Can you create a OWL Ontology or RDF Shapes description?

 

Easy peasy

No: no explicit relations known

7. Complete the vocabulary metadata:

 

8. Where will you register the vocabulary?
Some options: RVA, LOV, BioPortal, Agroportal, BARTOC, OBO, CESSDA

RVA - this is an Australian govt thing

9. How will you make the vocabulary accessible?
(i.e. make vocab and term URIs resolve)

Via Aust Gov Linked Data WG URIs to RVA (likely)

10. How will the FAIR vocabulary be maintained?
How will the custodians be involved?
How will the community be engaged and informed?

Likely by the standards maintainers - govt IT-027 committee - but this is not a task they have agreed to yet.

3. Police Districts

Name of vocabulary

Historical Police Districts

Scope (subject) of vocabulary

List of names of police districts from 1921-1950

Vocabulary source

Various: Book, spreadsheet, government report, etc. derived from past work on the census papers now in the archives.

Who contributed to this case-study?

Len Smith

Sandra Silcot

Susan Birtles

Steven McEachern (scribe)

  1. Who is the custodian of the vocabulary?
    (Community, organization, etc)

Various state archives (original lists)

Not yet existing (for the “shared” vocabulary)

2. Licensing:

What is the license on the original vocabulary?

Does it permit repurposing?

What license will apply to the FAIR vocabulary?

None - doesn’t exist officially yet, but there may be a license on the documents

See above

Depends on a and b

3. Vocabulary content:

Does every term have a unique label, and complete definition?
If not, where could you get them from?

Are the definitions consistent and distinct?

Does the vocabulary have a structure:
Are there well defined subsets?
Is there a hierarchy of terms?

Are there cross-references within the vocabulary?