RDF Notes

This document follows up on Michel Dumontier's feedback in the 2015 Dagstuhl sprint. Based on a first list of issues composed by Achim, the group discussion goes into more detail. See the RDF Binding document from Dagstuhl for the initial feedback.

Table of contents

Initial notes by Achim

Discussion in the group

Topics

List of contact people who know RDF

Issue on Patterns

Named Graphs

Initial notes by Achim

Joachim Wackerow | 24 November 2015

These notes are mainly on the minutes on the discussion with Michel Dumontier in Dagstuhl.

1)       The rules in Richard Cyganiak’s document were not really evaluated but the OWL output from the transformation of XMI.

a)       Issue: Review uncomplete. The rules might need just additions.

2)       General goals:

a)       RDF representation of DDI (meta)data should be as simple and maximally useful for query answering and linked data.

b)       Should have the kind of query ability or inference that helps you answer useful questions.

3)       Currently some not valid OWL is generated. The reason might be issues in the model with ComplexDataTypes. The transforming program doesn’t ignore these issues.

a)       Action: fix the issues in the model. Make transforming program more robust in order to ignore weak model status (is this really possible?)

4)       Untyped items. Background can be that orphan items exist in the model from DDI 3.2 import.

        a)       Action: Review untyped items.

5)       Abstract classes are treated as regular classes.

a)       Action: This should be fixed in the transformation from PIM to PSM by flattening the class hierarchy.

6)       Additional rules for transformation are necessary (my understanding for using other vocabularies). A generic transformation (DDI flavor) would not take advantage of the beauty of RDF.

a)       Issue: mapping to other standards or using other standards

7)       Additionally, the XMI should be transformed into a RDF constraint language for validation purposes of instances (ShEx and/or SHACL).

a)       Action: Collection of important validation rules (in English).

8)       A tighter reusable core should be achieved. An ontology-based model can help here.

a)       Action: Review of the model regarding a reusable core; i.e. there might be duplication of similar classes. Take advantage of abstract classes.

b)       Issue: Balance between clever model (in terms of modeling) and implementable model.

9)       Separation of administrative metadata and metadata about the entities.

a)       Action: Review if distinction is clear enough in the model.

10)   Needs further exploration with a Semantic Web expert:

a)       Graphs: use a graph to store the triples about an individual; explicitly relate the individual to the graph. Annotate the graph with record metadata.

b)       Record vs individual knowledge. Assign a versioned identifier for the record and a version-independent identifier for the individual. Couple the right metadata to either the record or individual.

c)       Review the code examples in the minutes, partly related to the two topics above.

d)       Review ontology examples like SIO and HP which Michel Dumontier co-authored.

Discussion in the group

Achim, Arofan, Jon, Marcel, Wendy | 26 November 2015

List of contact people who know RDF

➔       Chris Munroe (Manchester) Christopher.Munro@manchester.ac.uk

➔       Philip Couch (Manchester) Philip.Couch@manchester.ac.uk

➔       Nathan Cunningham (UKDA) nathan.cunningham@essex.ac.uk

➔       Michel Dumontier (Stanford) michel.dumontier@stanford.edu

➔       Eric Prud’hommeaux (W3C) eric@w3.org

➔       Franck Cotton (INSEE) franck.cotton@insee.fr

➔       Alejandra Gonzalez-Beltran (Oxford)  alejandra.gonzalezbeltran@oerc.ox.ac.uk

➔       Check with Fabio Grita (FAO Scientific Board Member) for possible RDF contact

We should take a sample of what we can produce today (Data Capture and a small example of a few classes like the Agent instance from Dagstuhl) and use this as the basis of an immediate review, plus some specific questions. The emphasis is to start this dialogue as soon as possible. Start with hand-crafted OWL example, then work on a corrected auto-generation.

Issue on Patterns

(Submitted as Issue DMT-32)

Two parts:

(1)     Could we define a reusable “core” of the model, to heighten consistency. Emphasis here is on the use of design patterns generally (not just micro-patterns). Could this benefit the DDI model and outputs? (Michel Dumontier from Dagstuhl)

(2)     At Dagstuhl, Gary mentioned the existence and potential use of “micro-patterns”, and suggested DDI could include these in the model. He suggested some example (see Dagstuhl notes).

Actions:

(1)     Explore the idea of patterns and a reusable core, and identify potential benefits/issues.

(2)     Look at the sites suggested by Gary, and look at his presentations for Ontology workshops. Identify what possible benefits and changes to the model could be made, with a focus on how this might impact the RDF bindings and their use more specifically.

Named Graphs

We know there are many ways to use named graphs in RDF, and DDI should recommend to users how this be done. There are potential issues with merging of named graphs as well. We do not understand enough about this issue to make an informed decision, but should specifically ask about the use of named graphs when we send out samples to our contact list.

Open issue regarding round-tripping of identifiers

There was also some discussion around how identifiers could be round-tripped. Each object already has the agency-id-version identification in the XML form, and has the localID construct to hold URLs coming from an RDF expression. In RDF, the agency, ID, and version would become literal properties of the object they identify. Objects without identifiers could become blank nodes in their RDF form.

Example: A variable might have multiple labels (multiple languages), but the labels don't have identifiers in the model or the XML. The question is whether these labels should have an identifier in the RDF—doing so would break the round-tripping.

Review of Michel’s Example from Dagstuhl

Input: RDF Binding document from Dagstuhl

Michel presented several alternative ways to “optimize” the RDF generated from the model.

One thing he suggested was the use of equivalent declarations from other namespaces, We are already planning on doing this once equivalent objects are captured in Drupal.

It would also be possible to programmatically inject sio declarations for everything, without adding anything to Drupal. The question is, who is using sio outside of Stanford.

Michel also recommended a “flattening” of some constructs: the example was where we use “Content” inside of “description” to handle language equivalence. When you have properties like Description that are repeatable, however, this will cause major problems, and we have many such repeatable properties in the model currently. This is probably not feasible. If there is only one property allowed, this would make sense, but sadly that is not the case.

These issues should be better documented and sent to our contact list, accompanied by code examples and a description of our requirements for the model (not only RDF, but XML and other possible syntaxes).