Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 5 Next »

Proof of concept and recommendations of thesaurus, vocabularies, ontologies for given case studies; Recommendations and examples for good practice and libraries of bridging / transformations between ontologies


NOTES: https://docs.google.com/document/d/1Xdj8lIYJa2PiVpNW2u4mii6ZxI69kAjeOhNVSb576VM/edit?usp=sharing


First PM Session, Tue 8 Oct

In which cells in the matrix are we working?

In relation to which stage in processing does this relate?

What level of guidelines are we working on?


Topic 4: 

Proof of concept and recommendations of thesaurus, vocabularies, ontologies for given case studies; Recommendations and examples for good practice and libraries of bridging / transformations between ontologies




Data 6

Guidelines for terminology repositories 7

Guidelines for terminology developers 8

Guidelines for selecting a terminology resource 8

Scope of the Guidelines 10




First PM Session, Tue 8 Oct

In which cells in the matrix are we working? 


  • Provider of Codelists and Classifications, I1, I2 (cells F-12, F-13)


In relation to which stage in processing does this relate?


  • discovery
  • ETL


What level of guidelines are we working on?


  • High-level
  • Practitioner
  • Technical



  • Ideal output: plan for working in the next day and a half



  • Handling variable levels of expertise
  • How do we ensure that the right solutions get to the right 


  • Start the conversation - architecture diagram
    • Framework based on the use cases and then generalise
    • Infant mortality data - ways of crawling to get the data
    • Three levels:
      • Architecture diagram - will speak to the high level
      • Descriptions on each component - second level
      • Show some code on how it happens - third level


Low-level demos

  • Building (or choosing and assessing) vocabularies / ontologies - demo how to spin up a high-fidelity, specific vocab that can be mapped out to generalised community resources
    • Mapping between formats - webpage/pdf/csv
  • Bridging / mapping ontologies - can we come up with a common methods?
  • Filling gaps in terminologies when they are discovered during mapping
  • Maintenance & versioning
  • Provenance of classes and of object properties



Use cases

  • Infant mortality
  • Social exclusion / poverty
  • Urban planning + sustainability + modelling 


Data sources - different formats: Excel, TSV, etc

Http

ETL 





Unknowns:

  1. How to transform data sources or represent them in hi-fi
  2. What is the transformation process
  3. What is the process to link up the data
  4. What is the ultimate representation for a knowledge graph


Portal/DataService/SPARQL endpoint as a data source

  • How to maintain the link: nano-crediting a class with information about the portal vs adding information about the API rules/calls


Annotation resources

  • datasets/ data files - formats, versions, topics, phenomena, time/space; provenance metadata (where it came from)
  • Variables and parameters
  • API rules/calls
  • Provenance layer


Discussion about linking relational databases to linked data / terminology

  • Conversion to triples (advanced) - beware lossiness in conversions, footnotes etc lost 
  • SQL tables linking elements (e.g. attributes) to IRIs (simple)
  • Wrapping the relational databases http://d2rq.org/ 



General issues

  • Understanding object properties across resources (e.g. from OBO federation OPs  to other systems)





Data

https://data.unicef.org/country/deu/

https://data.oecd.org/healthstat/infant-mortality-rates.htm (CSV)


https://data.unicef.org/resources/dataset/child-mortality/  (XLS)

https://data.worldbank.org/indicator/SP.DYN.IMRT.IN (CSV, XML, XLS)


From the whiteboard:

  • Infant Mortality
    • Census
    • Vital statistics
    • DHS
    • Oher admin
  • … region / infant mortality



LONDON examples


https://toolbox.google.com/datasetsearch/search?query=infant%20mortality%20london&docid=16hC2R4kFeHKt16pAAAAAA%3D%3D



Data cubes survey: https://colab.research.google.com/drive/1TCZKR7jL9whWkK5uCrQQsj2O5uq_ZZ9f# 




Term requests











Guidelines for terminology repositories

Users of terminology repositories

Developers / maintainers of repositories



Guidelines for terminology developers


Guidelines for selecting a terminology resource

Technical focus. Break up master list into user stories with varying degrees of expertise and at different point of workflows:

  • parties publishing datasets
  • parties parsing datasets into something useful that faithfully captures what the dataset says (e.g. using ddi, rdf, etc.)
  • parties integrating extracting data into common integrative representations e.g. knowledge graphs



Example:

Ten Simple Rules for Selecting a Bio-ontology

James Malone ,Robert Stevens,Simon Jupp,Tom Hancocks,Helen Parkinson,Cath Brooksbank

Published: February 11, 2016 https://doi.org/10.1371/journal.pcbi.1004743


  • Licensing
    • Does the resource follow an open? CC-0 (can’t practically cite an IRI)
    • Can you fork it and develop independently?
  • Adoption
    • Is the resource used effectively by several adopters (for their specific purposes)? (quality of usage over numerics)
    • Is there a contribution policy?
  • Interoperability  
    • Communities of interoperation - which one(s) do you need your resource to talk to?
    • No attempt to “lock in” users
    • Are they reaching outside their comfort zone? When there is no natural technical bridge, do they also consider the approach?
  • Expressivity:
    • Is the expressivity checked? (Using OWL is no guarantee of meaningful expressivity )
    • How much machine-readable expressivity do you need?
    • Do you need to future-proof? You work may only need a vocab now (encode as SKOS), but do you plan to do more in the future? Start conversations along the semantic gradient if needed
  • Maintainability
    • How responsive are the maintainers of the ontology with term requests?
    • What is the date of the last commit?
    • How well documented?
    • Are there example queries/competency questions (e.g. http://stato-ontology.org/)?
    • Is there a term deprecation/obsolescence policy?
    • Is it sustainable? E.g. sustained funding or plurality of developers, 
    • Are there automated quality checks (e.g. continuous integration)?
  • Governance / Editorial policies
    • Are new editors welcomed / trained? 
    • Is the process open?
  • Tooling available?
    • Are there communities developing tools to use the terminology?
  • Quality
    • Are there natural language definitions for the terms?
    • Are there axiomatic definitions?
  • Coverage
    • Does the ontology have the required terms? What are the gaps?


Caveats

  • Be aware / ask about legacy issues (advanced)



Scope of the Guidelines



  1. The Problem Space – A general statement positioning the challenge in terms of FAIR and any other frameworks which would be approachable from a cross-domain perspective. A discussion of the issues identified to which the guidelines offer solutions, structured along the typology suggested above: technical, methodological, other…


  1. Domain Relevance – What domains are covered by the specific guideline?


  1. Stakeholders – a discussion of the intended audience(s) for the guidelines. Researchers? Data Managers? Funders/strategists? Systems implementers?


  1. Specifications/Standards/Technologies – a description and explanation for the selection of the relevant resources applied to the problem. Domain standards? Generic technologies?


  1. Methodological Considerations – a discussion of the methodological implications of the guideline. Are there best practices in a business sense which would change to help provide a solution?


  1. Proposal – A detailed description of the overall guideline being recommended, and its business justification


  1. Elaboration of the Use Case – a description of the concrete case(s) analyzed in the formulation of the guidelines/solutions


  1. Exemplary Data and Metadata – Concrete examples of the kind of data and metadata being discussed.


  1. Application of Standards/Specifications/Technologies - “Code” examples of how the approach being advocated can be realized, in each of the identified standards/technologies and data/metadata examples.



  • No labels