Report: FAIRification Case Studies
Compiled by Simon J D Cox (CSIRO) simon.cox@csiro.au
from materials generated in the workshop by the contributors listed in each table.
- 1 Introduction
- 2 1. Colour Names
- 2.1 Other schemas
- 2.2 Communities - potential users
- 2.2.1 Needs of soil community
- 2.2.2 Geology
- 2.2.3 To do
- 2.2.4 Questions for RIT:
- 2.2.5 To derive:
- 3 2. AS 4590 - dwelling and address types
- 4 3. Police Districts
- 4.1 Background:
- 4.2 Aims:
- 4.3 Source information:
- 4.4 Comments:
- 4.4.1 Place types:
- 5 4. Administrative areas
- 6 5. The Establishment Means GBIF Taxon
Introduction
A recent paper by Cox et al. [2021] Ten simple rules for making a vocabulary FAIR outlined a basic recipe for taking an existing vocabulary and making it FAIR (Findable Accessible Interoperable and Reusable). The rules in that paper were based on experiences with a number of legacy vocabularies.
The initial activity in the FAIR Vocabularies workshop was to look at some additional un-FAIR vocabularies submitted by the participants, and explore how the Ten Simple Rules might be applied. The results of this activity are captured in the following grids.
1. Colour Names
Name of vocabulary | Colour names |
Scope (subject) of vocabulary | Colour names from the Munsell / ISCC-NBS Colour System (with equivalencies / translations into other colour spaces) |
Vocabulary source | Book: Color : universal language and dictionary of names / Kenneth L. Kelly and Deane B. Judd Available from the US government: http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nbsspecialpublication440.pdf Other copies (we looked at several to see what we could learn about copyright) https://ia801701.us.archive.org/9/items/coloruniversalla00kell/coloruniversalla00kell.pdf https://archive.org/details/coloruniversalla00kell/page/n13/mode/2up https://babel.hathitrust.org/cgi/pt?id=uc1.b4253551&view=1up&seq=5 https://www.google.com.au/books/edition/_/iskurh-fG7IC?hl=en&gbpv=0 Article: ‘sRGB Centroids for the ISCC-NBS Colour System’ (for conversion to sRGB) The ISCC-NBS Colour System by Paul Centore https://www.munsellcolourscienceforpainters.com/ISCCNBS/ISCCNB Munsell Renotation Data https://www.rit.edu/science/munsell-color-science-lab-educational-resources#munsell-renotation-data Munsell and Kubelka Munk toolbox (Octave/MatLab scripts) Munsell resources https://www.munsellcolourscienceforpainters.com/MunsellResources/MunsellResources.html Convert Munsell to hexidecimal (MIT licensed R package) ‘An ISCC-NBS Colour List for 'roloc' (roloc is an R colour naming package). https://www.stat.auckland.ac.nz/~paul/Reports/roloc/NBS/roloc-nbs.html *All Munsell colour designations translated to ISCC-NBS colours can be found in ISCCNBSDesignators.txt (uploaded; see: https://www.munsellcolourscienceforpainters.com/ColourSciencePapers/ColourSciencePapers.html, data files for ‘sRGB Centroids for the ISCC-NBS Colour System’) BGS (excel sheet): https://webapps.bgs.ac.uk/data/vocabularies/viewdata.cfm?name=DIC_MUNSELL_COLOUR NOAA PaST Thesaurous: https://www.ncdc.noaa.gov/paleo-search/cvterms?termId=3451 |
Who contributed to this case-study? | Shawn Ross, Mark Lindsay, Megan Wong, Anusuriya Devaraju, Rowan Brownlee |
| None, from a 1976 book - approach the Munsell Color Lab at RIT: https://www.rit.edu/science/munsell-color-lab - scheduling meeting now |
2. Licensing: What is the license on the original vocabulary? Does it permit repurposing? What license will apply to the FAIR vocabulary? | HathiTrust has digitised Color : universal language and dictionary of names. Rights for 1976 specified as public domain ‘digitised by Google’ (use of the OCR images themselves is restricted, but ‘’There are no restrictions on use of text transcribed from the images, or paraphrased or translated using the images’.). Full catalogue record here. Reuse conditions specified here. Need to confirm international public domain including ability to produce derivative works. We also need to check the ability to cross-reference / convert commercial systems like Munsell and Pantone. CC-0 - deliver attribution in payload |
3. Vocabulary content: Does every term have a unique label, and complete definition? Are the definitions consistent and distinct? Does the vocabulary have a structure: Are there cross-references within the vocabulary? Are there cross-references to items in other vocabularies? |
Yes - each level defines a 3D volume in the Munsell colour space (translation to other colour spaces may be necessary) Yes - at all levels, the definitions are complete (non-overlapping 3D volume in the Munsell colour space) Yes - subsets across six levels, no overlap, no gaps Yes - to other colour spaces Some - Pantone is a (commercial) vocab, others are values in other colour spaces (e.g., sRGB, various CIE systems (RGB, LAB, XYZ, xyY), CMYK, hexadecimal, Pantone, RAL, CIELUV - do we provide centroid only, or define limits of each 3D volume. See table below for summary of vocabulary structure:
|
4. Point of truth: What is the maintenance environment for the vocabulary content? Is the vocabulary maintenance history externally visible? | Initial work can be done in a GitHub repository I (SAR) created: https://github.com/saross/colour-names/ Point of truth will be reflection of vocabulary sources. If relationships between vocabularies are created and need to be governed - will need process/governance clarified (we can clone this repository to a organisational GitHub - or someplace else - later) Maintenance history is visible (make good commit comments!). |
5. Web identifiers: What is the base URI for the vocabulary? Can it be managed over the long term? Design a pattern(s) to be used for each term URI | Vocab will be submitted to Research Vocabularies Australia - what domain? (Discuss with RIT?) Opaque UUIDs / GUIDs as basis of URI? Persistence scheme - w3id, doi, handle, ..?
|
6. Encoding: Basic hierarchical vocabulary: implement a SKOS representations of the terms Richer semantics? Can you create a OWL Ontology or RDF Shapes description? | Model: SKOS or OWL? Likely SKOS, and OWL as required - to satisfy user needs
|
7. Complete the vocabulary metadata: | Metadata properties : Colour names (levels) definitions
Some to consider, eg -
|
8. Where will you register the vocabulary? | RVA? Registered and accessible at multiple service / dissemination points? Make discoverable in multiple places - where identified communities and members likely to use the resource may look |
9. How will you make the vocabulary accessible? | E.g., hosting via RVA, organized into collections. Identifying the user groups, custodian - their roles and responsibilities - will help determine required HTTP configuration for the vocabulary domain (rule 5) |
10. How will the FAIR vocabulary be maintained? | Potential maintainers : ANZSoil, Geoscience Australia in partnership with e.g., TERN, Australian Archaeologic Society Enquire with RIT Munsell Colour Laboratory (if they want a role or can contribute resources) Some sort of institution that can manage the vocab over the long term. Engagement through AU Vocab Interest Group Maintenance and governance - evaluated to suit need of custodian and users |
Vocabulary structure (question 3)
Level 1 (Munsell / ISCC-NBS) | Level 2 (Munsell / ISCC-NBS) | Level 3 (Munsell / ISCC-NBS) | Level 4 (Munsell / ISCC-NBS) | Colour space / translation |
4 | brown | 9 | yellowish brown | 76 | light yellowish brown | Munsell 10YR 6/4 | sRGB CIELAB CIExyY CIERGB CIEXYZ CMYK RAL (DE) PCCS (JP) Hex Pantone ?? Centroid? |
[colour] | [colour] |
|
| |
|
| |||
[colour] |
|
| ||
|
|
Geological example using Pantone (from https://ecat.ga.gov.au/geonetwork/srv/eng/catalog.search#/metadata/21883) - link to standard colours archive file at right on that page - contains an xls for mapping (Pantone and RGB)
Other schemas
http://www.ontobee.org/ontology/PATO?iri=http://purl.obolibrary.org/obo/PATO_0000014
https://webapps.bgs.ac.uk/data/vocabularies/dictionary.cfm?name=DIC_MUNSELL_COLOUR
Munsell colour palette: http://pteromys.melonisland.net/munsell/
https://stackoverflow.com/questions/3620663/color-theory-how-to-convert-munsell-hvc-to-rgb-hsb-hsl
http://purl.obolibrary.org/obo/NCIT_C37927
Communities - potential users
Needs of soil community
Use mostly munsell or undefined schema colour names. As well as RGB and munsell and conversion between, some in soil community may use: https://www.worldcat.org/title/revised-standard-soil-color-charts/oclc/255574307?referer=di&ht=edition as in National standard yellow book http://anzsoil.org/def/au/asls/soil-profile/colour
And see http://anzsoil.org/def/au/asls/soil-profile/colour-stat
And see p 149 https://www.publish.csiro.au/ebook/download/pdf/8016
US use Pantone
Geology
https://github.com/CSIRO-enviro-informatics/interactive-geological-timescale
Colour mapping (RBG and CMYK) https://stratigraphy.org/ICSchart/CGMW_ICS_colour_codes.xlsx
Making map representations accessible to the colour blind (e.g. avoid the colour combinations shown in the Ishihara test plates)
Ishihara Test for Color Blindness: https://www.colour-blindness.com/colour-blindness-tests/ishihara-colour-test-plates/
Farnsworth D-15 Dichotomous Color Blindness Test: https://www.color-blindness.com/color-arrangement-test/
USGS interesting mix of CMYK and qualitative descriptions https://pubs.usgs.gov/tm/2005/11B01/05tm11b01.html#heading155113810
Soils community incl industry, organisations, states territories, ANZIS https://github.com/ANZSoilData - use Munsell (as x 3 separate concepts, but more likely store results in DBases as concatenated)
Geoscience Australia - Irina - rock colour
On discussion list - AVSIG, AGLDWG National, Research Data Alliance
SKOS provides a ‘collection’ - a subset for a domain, e.g., soil colours?
To do
Interest/Governance follow up (Shawn)
Touch base with some main user groups to see what they use/what content they would like/resolution required/how they would like the information represented - Soils Peter W/Andrew Biggs (NCST) (Megan), GA Irina? (Mark - do you want to follow up? [mdl] Yep - I’ll do that and also ask CGI )
Make content as spreadsheet - at least most basic groupings (Shawn)
Convert to SKOS (Anu)
Trial on RVA test infrastructure (Rowan)
Questions for RIT:
Why are two colours missing from L2 when compared to ISCC-NBS (page 4)? (brownish orange; brownish pink) - Ok, the page A-7, Table 1 list is what is captured here - while Page 4 table 1 is ‘Hue names’ - which overlap but are different?
Why are the colours ordered differently from the ISCC-NBS system? Does it mean something?
To derive:
(Color: universal language and dictionary of names, Table 1, page A7)
Level 4 = 1 Hue step, 1 Value step, and 2 Chroma steps
Level 5 = 0.5 Hue step, 0.1 Value step, and 0.25 Chroma step
Level 6 = 0.1 Hue step, 0.05 Value step, and 0.1 Chroma step
How do we ‘trim’ the results? With the ‘real’ values from the 1943 renotaton? Or just use the renotation data and interpolate where needed? Hue is really the only problem
Plus: ensure that all colours in the RIT resource and in the XRite resource are represented. Does that mean we just do all 0.25 hue steps?.
Collapse levels 5-6 and just express as a decimal? Or keep Level 5 (generate it programmatically?), and Level 6 becomes
2. AS 4590 - dwelling and address types
Name of vocabulary | AS 4590 - Address codes - “sub-dwelling unit-type”
|
Scope (subject) of vocabulary | Dwelling and address types |
Who uses this vocabulary and what for? | Many, though some of them don’t actually know that are |
Vocabulary source | AS 4590:2017 (same as AS 4590:2006) - csv file extracted here https://drive.google.com/drive/u/0/folders/1pVZCBhlKw-ivzURn6XCUvVq3fijXHOHq
Similar to G-NAF flat-types (AS 4819) (possibly source)
Compare with ABS “Functional Classification of Buildings” Also Australian building-code |
Who contributed to this case-study? | Nick Car Irina Bastrakova Jenny Long Edmond Chuc (day 1) Simon Cox Michael Biddington Armin Haller (day 2) Jason Atkinson (day 2) |
| Standards Australia IT-027 (“Data management and interchange”) but IT-004 also has some role (listed by Standards Australia as the “publisher”).
Shadows ISO/IEC JTC1/SC 32
Also see ABS and Australian building regs which have related codelists designed for different needs. |
2. Licensing: What is the license on the original vocabulary? Does it permit repurposing? What license will apply to the FAIR vocabulary? | This content appears to be a subset of the ABS “Functional Classification of Buildings” - maybe have the ABS vocab as the FAIRification target here, and then provide an interface to the 4590 subset for those users who need it.
Alert the owners that we want to FAIRify, but move ahead anyway. |
3. Vocabulary content: Does every term have a unique label, and complete definition? Are the definitions consistent and distinct? Does the vocabulary have a structure: Are there cross-references within the vocabulary? Are there cross-references to items in other vocabularies? | Yes, no (defn). Don’t know where from (common terms) N/A No - flat list No No, but it seems to either be derived from AS 4419 “Flat Types” (https://gnafld.net/def/gnaf/code/FlatTypes) or vice versa |
4. Point of truth: What is the maintenance environment for the vocabulary content? Is the vocabulary maintenance history externally visible? |
The standards document, we presume No |
5. Web identifiers: What is the base URI for the vocabulary? Can it be managed over the long term? Design a pattern(s) to be used for each term URI | https://linked.data.gov.au/def/?? - assumed Would be easy for this flat list if we can establish a vocab ID. If that was as4590-sdut (Sub-dwelling Unit Type), given we have codes, we might have for Studio: https://linked.data.gov.au/def/as4590-sdut/stu |
6. Encoding: Basic hierarchical vocabulary: implement a SKOS representations of the terms Richer semantics? Can you create a OWL Ontology or RDF Shapes description? |
Easy peasy No: no explicit relations known |
7. Complete the vocabulary metadata: |
|
8. Where will you register the vocabulary? | RVA - this is an Australian govt thing |
9. How will you make the vocabulary accessible? | Via Aust Gov Linked Data WG URIs to RVA (likely) |
10. How will the FAIR vocabulary be maintained? | Likely by the standards maintainers - govt IT-027 committee - but this is not a task they have agreed to yet. |
3. Police Districts
Name of vocabulary | Historical Police Districts |
Scope (subject) of vocabulary | List of names of police districts from 1921-1950 |
Vocabulary source | Various: Book, spreadsheet, government report, etc. derived from past work on the census papers now in the archives. |
Who contributed to this case-study? | Len Smith Sandra Silcot Susan Birtles Steven McEachern (scribe) |
| Various state archives (original lists) Not yet existing (for the “shared” vocabulary) |
2. Licensing: What is the license on the original vocabulary? Does it permit repurposing? What license will apply to the FAIR vocabulary? | None - doesn’t exist officially yet, but there may be a license on the documents See above Depends on a and b |
3. Vocabulary content: Does every term have a unique label, and complete definition? Are the definitions consistent and distinct? Does the vocabulary have a structure: Are there cross-references within the vocabulary? |