Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Proof of concept and recommendations of thesaurus, vocabularies, ontologies for given case studies; Recommendations and examples for good practice and libraries of bridging / transformations between ontologies

NOTES: https://docs.google.com/document/d/1Xdj8lIYJa2PiVpNW2u4mii6ZxI69kAjeOhNVSb576VM/edit?usp=sharing

First PM Session, Tue 8 Oct

In which cells in the matrix are we working?

In relation to which stage in processing does this relate?

What level of guidelines are we working on?

Topic 4: 

...

Recommendations of thesaurus, vocabularies, ontologies for given case studies; Recommendations and examples for good practice and libraries of bridging / transformations between ontologies

Data 6

Guidelines for terminology repositories 7

Guidelines for terminology developers 8

Guidelines for selecting a terminology resource 8

Scope of the Guidelines 10

First PM Session, Tue 8 Oct

In which cells in the matrix are we working? 

  • Provider of Codelists and Classifications, I1, I2 (cells F-12, F-13)

In relation to which stage in processing does this relate?

  • discovery
  • ETL

What level of guidelines are we working on?

  • High-level
  • Practitioner
  • Technical
  • Ideal output: plan for working in the next day and a half
  • Handling variable levels of expertise
  • How do we ensure that the right solutions get to the right 
  • Start the conversation - architecture diagram
    • Framework based on the use cases and then generalise
    • Infant mortality data - ways of crawling to get the data
    • Three levels:
      • Architecture diagram - will speak to the high level
      • Descriptions on each component - second level
      • Show some code on how it happens - third level

Low-level demos

  • Building (or choosing and assessing) vocabularies / ontologies - demo how to spin up a high-fidelity, specific vocab that can be mapped out to generalised community resources
    • Mapping between formats - webpage/pdf/csv
  • Bridging / mapping ontologies - can we come up with a common methods?
  • Filling gaps in terminologies when they are discovered during mapping
  • Maintenance & versioning
  • Provenance of classes and of object properties

Use cases

  • Infant mortality
  • Social exclusion / poverty
  • Urban planning + sustainability + modelling 

Data sources - different formats: Excel, TSV, etc

Http

ETL 

Unknowns:

  1. How to transform data sources or represent them in hi-fi
  2. What is the transformation process
  3. What is the process to link up the data
  4. What is the ultimate representation for a knowledge graph

Portal/DataService/SPARQL endpoint as a data source

  • How to maintain the link: nano-crediting a class with information about the portal vs adding information about the API rules/calls

Annotation resources

  • datasets/ data files - formats, versions, topics, phenomena, time/space; provenance metadata (where it came from)
  • Variables and parameters
  • API rules/calls
  • Provenance layer

Discussion about linking relational databases to linked data / terminology

  • Conversion to triples (advanced) - beware lossiness in conversions, footnotes etc lost 
  • SQL tables linking elements (e.g. attributes) to IRIs (simple)
  • Wrapping the relational databases http://d2rq.org/ 

General issues

  • Understanding object properties across resources (e.g. from OBO federation OPs  to other systems)

Data

https://data.unicef.org/country/deu/

https://data.oecd.org/healthstat/infant-mortality-rates.htm (CSV)

https://data.unicef.org/resources/dataset/child-mortality/  (XLS)

https://data.worldbank.org/indicator/SP.DYN.IMRT.IN (CSV, XML, XLS)

From the whiteboard:

  • Infant Mortality
    • Census
    • Vital statistics
    • DHS
    • Oher admin
  • … region / infant mortality

LONDON examples

https://toolbox.google.com/datasetsearch/search?query=infant%20mortality%20london&docid=16hC2R4kFeHKt16pAAAAAA%3D%3D

Data cubes survey: https://colab.research.google.com/drive/1TCZKR7jL9whWkK5uCrQQsj2O5uq_ZZ9f# 

...

Term

...

Search

...

IRIs

...

Infant mortality

...

https://www.ebi.ac.uk/ols/ontologies/ncit/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCIT_C16729

http://purl.obolibrary.org/obo/NCIT_C16729

http://purl.obolibrary.org/obo/OMIT_0008353

...

Infant mortality rate

...

Infant mortality rate

http://epidemiology_ontology.owl#EO:0000074

...

Infant mortality

...

https://lov.linkeddata.es/dataset/lov/terms?q=infant+mortality

...

http://dbpedia.org/ontology/infantMortality

Term requests

Guidelines for terminology repositories

Users of terminology repositories

Developers / maintainers of repositories

...

Guidelines for selecting a terminology resource

Technical focus. Break up master list into user stories with varying degrees of expertise and at different point of workflows:

  • parties publishing datasets
  • parties parsing datasets into something useful that faithfully captures what the dataset says (e.g. using ddi, rdf, etc.)
  • parties integrating extracting data into common integrative representations e.g. knowledge graphs

Example:

Ten Simple Rules for Selecting a Bio-ontology

James Malone ,Robert Stevens,Simon Jupp,Tom Hancocks,Helen Parkinson,Cath Brooksbank

...

  • Licensing
    • Does the resource follow an open? CC-0 (can’t practically cite an IRI)
    • Can you fork it and develop independently?
  • Adoption
    • Is the resource used effectively by several adopters (for their specific purposes)? (quality of usage over numerics)
    • Is there a contribution policy?
  • Interoperability  
    • Communities of interoperation - which one(s) do you need your resource to talk to?
    • No attempt to “lock in” users
    • Are they reaching outside their comfort zone? When there is no natural technical bridge, do they also consider the approach?
  • Expressivity:
    • Is the expressivity checked? (Using OWL is no guarantee of meaningful expressivity )
    • How much machine-readable expressivity do you need?
    • Do you need to future-proof? You work may only need a vocab now (encode as SKOS), but do you plan to do more in the future? Start conversations along the semantic gradient if needed
  • Maintainability
    • How responsive are the maintainers of the ontology with term requests?
    • What is the date of the last commit?
    • How well documented?
    • Are there example queries/competency questions (e.g. http://stato-ontology.org/)?
    • Is there a term deprecation/obsolescence policy?
    • Is it sustainable? E.g. sustained funding or plurality of developers, 
    • Are there automated quality checks (e.g. continuous integration)?
  • Governance / Editorial policies
    • Are new editors welcomed / trained? 
    • Is the process open?
  • Tooling available?
    • Are there communities developing tools to use the terminology?
  • Quality
    • Are there natural language definitions for the terms?
    • Are there axiomatic definitions?
  • Coverage
    • Does the ontology have the required terms? What are the gaps?

Caveats

  • Be aware / ask about legacy issues (advanced)

Scope of the Guidelines

  1. The Problem Space – A general statement positioning the challenge in terms of FAIR and any other frameworks which would be approachable from a cross-domain perspective. A discussion of the issues identified to which the guidelines offer solutions, structured along the typology suggested above: technical, methodological, other…
  1. Domain Relevance – What domains are covered by the specific guideline?
  1. Stakeholders – a discussion of the intended audience(s) for the guidelines. Researchers? Data Managers? Funders/strategists? Systems implementers?
  1. Specifications/Standards/Technologies – a description and explanation for the selection of the relevant resources applied to the problem. Domain standards? Generic technologies?
  1. Methodological Considerations – a discussion of the methodological implications of the guideline. Are there best practices in a business sense which would change to help provide a solution?
  1. Proposal – A detailed description of the overall guideline being recommended, and its business justification
  1. Elaboration of the Use Case – a description of the concrete case(s) analyzed in the formulation of the guidelines/solutions
  1. Exemplary Data and Metadata – Concrete examples of the kind of data and metadata being discussed.

...

danbri (Unlicensed) 

Simon Cox

niklas.kolbe (Unlicensed)

maria-cristina marinescu (Unlicensed)

Barbara Magagna (Unlicensed)
Alejandra Gonzalez-Beltran (Unlicensed)

pbuttigi@mpi-bremen.de (Unlicensed)

NOTES: https://docs.google.com/document/d/1Xdj8lIYJa2PiVpNW2u4mii6ZxI69kAjeOhNVSb576VM/edit?usp=sharing


"Tackling cross-domain interoperability issues for FAIR data - Part 1: selecting terminology resources"


Material produced by the group during the week:

Plans for follow-up: