Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning
iconfalse

Materials from External Review work at Dagstuhl Sprint


Expand
titleDDI4 MRT Virtual meeting 2019-01-30

DDI4 MRT Virtual meeting 2019-01-30

Agenda (as in invite of 2019-01-28)

Goal: To have a draft program of work for NADDI Sprint (to submit to EB), and an initial list of proposed tasks.

  1. Description of what is needed for organizing NADDI Sprint (from Achim, with possible draft planning document?)
  2. Discussion of work topics/tasks sufficient for initial organization/NADDI Sprint

Suggestions:

    - Documentation of datum-based model application to examples of event data, aggregates, etc.

  • Agreed list of data types to be worked on

  - Existing modeling technical requirements/issues

  • Simplification of the model (i.e. less inheritance and less specialized classes)
  • Review of collections (use of appropriate UML properties, use of collections throughout the model)
  • Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
  • Review of views (definition and effective use of subsets of the model)

  - Others

  1. Plan for addressing infrastructure tasks (modeling tools, production framework & process, testing groups/liaison, etc.) to support immediate and longer-term tasks

    - Are there ideas/candidate tools which need to be written up/further explored?

  • Current status of production platform post-Berlin

    - Identify tester and potential testers

(We may not get this far, but if we have time it would be good)

Minutes DDI4 MRT Virtual meeting 2019-01-30

Attendees:  Achim, Arofan, Dan G., Flavio, Hilde, Jay, Larry, Oliver, Wendy

Apologies: Jon

1.      Description of what is needed for organizing NADDI Sprint (from Achim):

A goal for the meeting is to have an agreed document regarding the NADDI Sprint planning ready to send to the AG to inform their discussions at their next meeting, and further to apply for funding of the possible sprint.

Achim prepared and sent out the document ‘NADDISprintPlanning.docx’ to the srg list in advance of the meeting. This is a shell document where some content of the document needed to be filled in or reviewed and agreed while the meeting.

The meeting was structured in three parts: 1) Topics for the possible NADDI Sprint; 2) Review of possible participants and funding; 3) Other organizational issues regarding the possible NADDI Sprint.

1) Topics for the possible NADDI Sprint (see point 2 in the agenda)

a) Documentation of datum-based model application to examples of data structures (to be discussed and agreed at the meeting which data structures to focus on at the Sprint).

b) Discussion and possible resolution of structural model issues:

  • Simplification of the model (i.e. less inheritance and less specialized classes)
  • Review of collections (use of appropriate UML properties, use of collections throughout the model)
  • Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
  • Review of views (definition and effective use of subsets of the model)

Status: Points a) and b) agreed as topics for the possible NADDI Sprint.

Discussion and agreements regarding example data structures a)

Example data structures (point a) were discussed after the structural model issues (point b).

The discussion regarding which data structures to focus on as examples at the possible NADDI Sprint was centered on whether to focus on common vs. complex cases and corner cases.

Dan G. pointed out the importance of modelling complex cases, as more common or simple cases would then be solved at the same time. Others pointed out that issues could occur even if a similar approach is used. Agreement was reached to focus on the common cases as a preparation for the possible NADDI Sprint.

Status:  Agreement was reached to focus on the following common data structures for the possible NADDI Sprint:

  • Rectangular data
  • Event data (wide and narrow data)
  • Single datum points
  • Multidimensional data like data cubes and aggregate data

Arofan and Wendy pointed out that the Variable Cascade documentation (provided for example in the Variable Cascade presentation from the Dagstuhl workshop DDI Train-the-Trainers 2018)  indicates the style and level of information needed for documentation.

        Status: Wendy will add this as a prototype review comment.

After NADDI further data structures may possibly be explored, for example NoSQL (non SQL) data like Hadoop data, graphs etc.

         Status: Agreed

In the Appendix an example from the discussion provided by Larry is found.

Discussion and agreements regarding structural model issues b)

Structural model issues (point b) were discussed before the example data structures (point a).

Conceptual resolution/MRT: Jay brought up the issue if structural issues could be resolved conceptually or by using the MRT approach. Flavio pointed out the need to look at many different examples to check out structural modelling issues. Achim indicated this could be a topic for the possible face to face meeting and something for a work group to focus on in advance.

Complexity of the model: Flavio commented that the model is complex because it is made complex. It has multiple levels and covers both common and domain specific needs. To simplify the understanding, some of the content could for example be hidden for specific user groups.

Achim asks if the model can be improved by focusing on questions like:

  • What is really the core?
  • What are the fine-grained details?
  • What are domain specific things?

Work regarding the complexity of the model could be done in advance and brought to the sprint.

Review of views: The revision of Views is important. Achim points out that even a simple view like the Agency view drives in a lot of classes.

Flavio points out that Views are complex because they currently are designed to cover multiple dimensions. The Classification View is for example meant to cover reuse, classification management and publishing. This and other views would need separation into smaller sets to be easier to understand.

Larry expresses that the model currently is highly connected but that good documentation can help the understanding.

Status: Agreement to focus on the four bullet points under b) above for the planned Sprint. Tasks should be broken up as much as possible. Smaller groups could work on each of those and get back with a proposal for the full group after a week or two. A specific person should be responsible to follow up on the work on each task.

 2) Review of list of possible participants and funding

The following agreement was made:

 The following people would be available in person for this meeting (their need for funding in parenthesis):

  • Achim Wackerow (travel, accommodation, food)
  • Arofan Gregory (travel, accommodation, food)
  • Dan Gillman (accommodation, food)
  • Flavio Rizzolo (lives in Ottawa)
  • Hilde Orten (to be clarified)
  • Jay Greenfield (accommodation, food)
  • Jon Johnson (accommodation, food)
  • Larry Hoyle (accommodation, food)
  • Wendy Thomas (accommodation, food)

Most of the people would need funding from the DDI Alliance as specified in the NADDISprintPlanning_1_0.docx document.

Oliver Hopt would be available by phone.

3) Other organizational issues regarding the possible NADDI Sprint

Possibilities for meeting location and lodging have been checked out and booked by Flavio and Achim as follows:

  • Two meeting rooms at StatCan for Tuesday and Wednesday
  • StatCan is closed on the Monday due to Easter. A hotel can be used for the Monday meeting for additional costs and a room is booked.
  • 12 rooms are reserved at the hotel. The price is a bit higher on Sunday and Monday then on Tuesday and Wednesday, due to Easter.

Two documents are sent to the AG for their feedback prior to their next meeting (also sent to the srg list):

  • The MRT DDI4 Core proposal document (MRT_DDI4Core_1_0.docx) - sent by Achim on Monday 28th.
  • An agreed, updated version of the NADDI Sprint Planning document (NADDISprintPlanning_1_0.docx) – sent by Achim after the meeting on Wednesday 30th.

Further follow-up is required regarding organizing the start-up of the work, and making plans for what needs to be prepared in advance of the possible NADDI Sprint.

Appendix

Example from Larry related to discussions of point a):

With the ability to describe data at the datum level DDI should be able to describe data like that in the following example through transformations from traditional rectangular (wide) layouts into key-value (tall) representations.

 DDI4 can currently describe the data in the wide layout, but, though we have discussed how to do the tall representation, that work has not been completed in the model.

 Wide data table:Image Added


Corresponding tall representation: Image Added

Transformations between these layouts are common in data software packages. The SAS code below shows the transformation from the wide to the tall.

Note that in the Tall representation the column Source is a pointer to a variable in the wide layout. The column Value1 is not a traditional variable, in that there is no one value domain or concept associated with the whole column, instead those things depend on the pointer in Source.

If we can properly describe datum level metadata we should be able to describe the value domain and concept associated with the “yes” category label (which is actually a code of 1 in the SAS dataset) in the Value1 column. We should also be able to describe the meaning and units of measurement of the value 185 in the same column.

 Proc format;

 value yn

   1="yes"

   2="no"

   ;

/* example rectangular file */

data fooWide;

input Name $ Height Answer;

label Name="Person name"

      Height="height in cm"

       Answer="Answer to 'Are you hapy?";

format Answer yn.;

datalines;

Joe 185 1

Mary 160 2

;

run;

proc sort data=work.fooRect;

by Name;

PROC TRANSPOSE DATA=fooWide

     OUT=WORK.fooTall(LABEL="Transposed WORK.FOORECT")

     PREFIX=Value

     NAME=Source

     LABEL=Label

 ;

     BY Name;

     VAR Height Answer;

     format Value1 yn







Expand
titleVirtual MRT meeting 2019-01-23

DDI4 MRT Virtual meeting 2019-01-23

Agenda (as in invite of 2019-01-23)

Goal: This meeting should get us to the point where we are ready to propose a formalization of this effort to the DDI Executive (or take other steps necessary for approval). To that end, the following agenda is proposed: 

Agreeing the document as regards:

- Organization

- Scope

- Timeline

Details on organization and scope:

- MRT Lifecycle feedback loop

- Status of the sub-groups (see new version of document, section on organisation and structure, as well as section 4 in the minutes of last meeting).

- Alignment of other standards, provenance (see new version of document, section on alignment with metadata structures in DDI4 Core, and discussions in the Appendix of the minutes of last meeting)

- Finalizing the document and process for approval: Things to be added, changed or removed – or approve new version as is?

Minutes DDI4 MRT Virtual meeting 2019-01-23

Attendees:  Achim, Arofan, Hilde, Jay, Jon, Larry, Oliver, Wendy

Organization and scope:

Basis for the discussion:  Document updated by Arofan with input from Achim, ‘MRT_DDI4Core_Diff_0_2_and_0_3_JW’, attachment to email from Achim to srg list 2019-01-23.

The goal of the meeting was to finalize the MRT-DDI4 Core document to be sent to the Advisory Group for their comments.

         Jay suggested to start to plan work tasks as well at this meeting.

                Status: Agreement to take on tasks later on, and to prioritize the finalization of the document at this meeting.

Discussions regarding the document:

-Organisation and Structure: Achim points out the Core group guides the whole effort, defines sub-tasks and assigns a responsible person for the task who reports back to the full group. The feedback loops should be done in short, iterative cycles. Tasks are not long term, and should be discrete and well-defined. Important that the document reflects this.

        Status: Agreed

-Role of the MRT in the organization: Wendy asks how the MRT relates to other DDI groups. Larry points out that this is a new way of organizing the work of the Modelling Team.

        Status: Agreement that MRT replaces the Modelling Team. The work of the group should be well aligned with the Advisory Group.

-MRT feedback loop: The requirements for the feedback loop were discussed. Achim points out that the requirements are just a repetition of earlier goals of the Moving Forward project.

Proposals for amendments to bullet points (for clarification purposes):

-Remove ‘if required’ from the ‘looseless roundtrip’ bullet point.

-Add ‘Stability’ to the ‘Consistency’ bullet point.

-‘Persistence of the model’ change to ‘Persistent expression of the model in canonical form’ (not to be confused with canonical XMI).

       Status: Agreement to update the bullet points accordingly.

-Mapping of DDI4 to earlier versions of DDI: This was discussed at our last meeting (January 16th). Larry points out that conformance and divergence with previous versions of DDI should be clearly defined.

Status: Agreed to include a section on this in the document.

-Alignment with other metadata standards: Jay asks if SDMX should be mentioned in the list of standards included in the document. Arofan points out that the document indicates ‘at least’ which standards the DDI4 Core should be interoperable with.

        Status: Agreed to highlight ‘at least’ in the document.

-Production process: Jon asks if the production process should be mentioned in the document. Arofan points out that this is a big and important topic that needs to be addressed. We will need to come back to what the options are.

Status: Agreement that the production process should be identified in the document as something that would need to be addressed.

Timeline:

-Timing of the DDI4 Core work.

Status: Agreement for the DDI Core work to take place in rapid cycles of weeks, not months. A calendar year is the anticipated goal. Leave wording in the document as stands.

-Timing of the finalization of the document: Wendy, Achim and Arofan points out the importance of finalizing the document before the next meeting of the Advisory Group (scheduled to next Wednesday).

        Status: Agreed to finalise the document for Monday and send to the AG for their comments. Arofan send to the MRT group today (on the 23rd ) for comments.

Other:

NADDI Sprint: Achim proposes a face-to-face meeting with the group after NADDI, of possibly three days, and asked if people in the group would be interested in this. All participants on the call indicated that they would be interested. Their possibilities for attendance and dependencies are specified below:

  • Achim (needs funding from the DDI Alliance)
  • Arofan (needs funding from the DDI Alliance)
  • Dan G.
  • Flavio (will hopefully be able to attend – he lives there)
  • Hilde (needs funding from NSD)
  • Jay (needs room support from the DDI Alliance – will cover his own travel)
  • Jon (depending on acceptance of abstract for the NADDI conference).
  • Larry
  • Oliver (will attend virtually)
  • Wendy (needs funding from the DDI Alliance)

Possible topic for the agenda of the NADDI Sprint: Flavio proposes to focus on the modelling requirements.

                Status: Agreement that Achim follows up regarding the possible NADDI Sprint with the AG, contact the local organisers regarding possible localities etc.       

...

Expand
titleVirtual MRT meeting 2019-01-16

Agenda (as in meeting notes from of 2019-01-09)

Organizational approach of where we are going and how we organize the approach
Scope, focus, groups, approach proposal
Who else needs to be recruited to make this functional
What is the approval process
OUTCOME: draft for approval

Modeling technical requirements - need to provide a summary for comprehension
Platform questions - approach to addressing this

SPARKX cloud modeling approach for UML modeling https://www.sparxsystems.com.au/enterprise-architect/cloud-services/cloud-services.html
Question of production process - where this fits
XMI output - canonical approach

Minutes DDI4 MRT Virtual meeting 2019-01-16

Attendees: 

Achim, Arofan, Dan G., Flavio, Hilde, Jay, Larry, Oliver, Wendy

Topics:

At the meeting the organizational approach was discussed as described below.

UML modelling tools was discussed by email between the last meeting and this meeting by Flavio and Achim (see information under 8) below as well as the full correspondence in the appendix).

Provenance issues and relationship with other models were also discussed between the meetings by Flavio and Jay (also under 8) below and included in the appendix).

Organizational approach:

Basis for the discussion:  

A) Document ‘MRT_DDI4Core_0_2’, attachment in email from Arofan to srg list 2019-09-01.

B) Achims questions to be clarified in order to build a good basis for the work next year, in email from Achim to srg list 2018-12-12. Achims questions (1 – 8) and their workflow status are specified below:

  1. Is there an agreement that on Modeling, Representation, and Testing replaces the existing Modeling Group?

Status: Agreed

2. Focus on DDI 4 Core, like Conceptual, Data Description, and Process. These areas are important for any use case perspective. Additional areas can be identified according to business requirements. But the focus on a core increases the chances to have a robust and  mature deliverable.

Status: Agreed to focus on the core

3. Description of major tasks regarding major modeling issues

               -Provenance was brought in as a new topic in a discussion between last week’smeeting and this meeting, see point 8) and the appendix.

Status: Needs follow-up discussions

4. Participants and their roles/perspectives

-Proposal in the ‘MRT_DDI4Core_0_2’ document to have an administration group (coordination team) with sub-groups.

Achim suggests that smaller groups can work independently on different things and get back to bigger groups with recommendations.

        Status: Agreed

-Participants of the MRT coordination team:

Arofan suggests that this group is the MRT group coordinating team, together with Jon who has also expressed interest in this.

Status: Agreed

-Sub-teams: In the ‘MRT_DDI4Core_0_2’ document, three possible sub-teams are proposed, all with an identified lead: a modelling sub-team, a representation sub-team (with sub-teams for each representation, xml, RDF, Phyton etc.), a documentation sub-team and a testing and representation sub-team.

Larry expresses a concern for the idea of fixed sub-groups due as there may not be enough people for this.

Achim proposes to think in terms of more ad hoc task oriented sub groups. Invite external experts when needed.

         Status: Needs follow-up work

-Perspectives:

A task proposed by Larry that also regards modeling (question 3 above) is to have DDI2 mapped into the DDI4 Core for the end of the year.

Comment from Achim: This work can identify missing pieces and flaws in the modelling. Not sure if all can be resolved by the year, but should be possible to identify them. Do mapping first and then modelling.

Comment from Arofan: Transformations should be developed.

Flavio: A modelling tool should be used rather than excel for the mapping.

Status: Agreement about the requirement about the mapping of DDI2 to DDI4. Needs follow-up work.


Task proposed by Achim:  Work on data description usage for different representations and data forms, for example unit record data, event long/short, aggregate/cube, single datum in a lake. Detailed tasks should be developed.

Example data that can be used for this purpose are data from the Australian Election Study (Larry) and the ESS (Achim), the Alpha Network (Jay), possibly others. The Alpha Network has lots of different data types. Several of the relevant data sets are structured in DDI2. Where real data are difficult to provide or new, like a datum in a lake, made up examples should be provided.

Status: Agreement on the task. Needs detailing of sub-tasks , follow-up work, who’s involved etc..


Tools:  Flavio: Need to decide which tools to use in our work (see also discussions under 8) and discussion emails in the appendix).

Status: Needs follow-up

-Administrative work:

Hilde does the meeting minutes

Status: Agreed

       Further administrative work, chairing etc.

                Status: Needs follow up.

5. Is the proposed timeline for a DDI 4 Core at end of 2019 reasonable?

Status: Agreed working goal.

6. The development of the business requirements document can be worked on in parallel but is not task of this group.

Comment from Arofan:  Some requirements identified by this work that affect the core areas could be interpreted as technical requirements and fed back to the MRT group and be a task for the modelers.

Comment from Achim: The focus of the MRT approach is the goal of a stable DDI4 core in one year. The focus on business requirements should not be tied to close to the task in order not to delay this process. It would be important to distinguish between what we need to have in to have a functional DDI4 core, and what can be added later.

Comment from Flavio: Agrees with Achim. For each step in the MRT cycle it should be decided what it would make sense to include. The MRT Coordination group should decide on this.

Status: Achims proposal agreed.

7. Information on these agreements to other groups and DDI Alliance committees

A goal is to finalize a document that takes into account the agreements made regarding the MRT approach. The document about agreements which will represent our proposal for the DDI4 core, will be developed and sent to the SB and EB for their approval, as well as to other groups.

Achim and Wendy: The timeliness of this document should be decided on the basis of decisions regarding the work of the group.

               Status: Agreed

8. Identification of issues which can be worked on in the next couple of weeks independently of group meetings.

                - To do for the next meeting (2019-01-20)

Hilde: Will post the minutes

Arofan:  Will prepare agenda for next week with input from others and send out invitation to the meeting to the srg list with agenda and meeting link prior to the meeting.

                 All: Think about the open issues from this meeting.

- Contributions since last week’s meeting (2019-01-09)

                UML tools discussions between Flavio and Achim:

In an email to the srg list of 2019-01-10, Flavio points out that we need to decide on a platform for developing, managing and sharing UML models. He proposes to use EA Sparx. The Canonical XMI support needs to be checked out. As a response to this Achim replies in an email to the srg list of 2019-01-15 that this topic needs to be discussed in the MRT and TC groups. Achim suggests to use an open tools solution, on the background that DDI is a standard, and we cannot risk that a DDI model can be used only in one tool, costs can be an issue regarding commercial tools etc. The Canonical DDI4 XMI has proved importable by many different UML tools. The problem is that most UML tools provide a custom XMI flavor rather than canonical XMI. Achim recommends to look into Eclipse UML tools, which have an XMI flavor that more easily can be bridged to canonical XMI than many other tools. Bridging might be supported by the Eclipse community. The Eclipse tools can do many other different things, for example to enable transformations from PIM to     PSM.

                The slide below provided by Achim that describes possible usages of Eclipse tools for MRT purposes. See the email conversation in the appendix below for the full argumentation.

                Status: Should be looked further into.


Provenance/lineage and relationship with other standards discussions between Flavio and Jay:

In an email to the srg list of 2019-01-11 Flavio brings up the question of supporting different types of lineage/provenance, and asks if everything we need to capture can be captured by Prov-O or if different standards are needed.     

Jay replies to this by providing references to a review of several provenance models, and some articles.  Jay proposes further to form a provenance sub-group to look into this.

He also raises the issue of DDI should copy or plug and play with other models, for example SDMX. In two different emails to the srg list of 2019-01-16, Flavio replies that he believes there would not be resources available to do the plug and play with standards, and he believes that DDI should specialize in a small, well-design and well-integrated set of classes to cover the aspects of the data (and metadata) lifecycle that other vocabularies either don't cover or cover poorly. See the full discussion in the emails of the appendix.

Status: To be discussed

Appendix

Email correspondences between meetings between Flavio and Achim on UML tools and Flavio and Jay on provenance and relationship with other tools:

UML Tools:

Email from Flavio to srg list of 2019-01-10

Hi Achim,

We need to make a decision on a platform for developing, managing and sharing UML models. A well-know tool for that purpose is EA Sparx -- We use it at StatCan and in some UNECE HLG projects, e.g. GSIM, CSPA.

Sparx allows users to create arbitrary views on-the-fly by dragging and dropping objects from the underlying model. This way it is possible to deal with smalls subsets of the model at a time during development, and also to target different audiences for communication purposes. It also supports BPMN. The model can be exported to multiple programming languages.

Here you will find some pricing information for the cloud version:

https://sparxsystems.com/products/procloudserver/purchase.html

This is for stand alone licenses:

https://sparxsystems.com/products/ea/shop/index.html

I guess the big technical question, beyond design capabilities, is which version of XMI is supported to make sure we can import/export the model to other platforms. There is some information here, although not conclusive:

https://www.sparxsystems.com/enterprise_architect_user_guide/13.5/model_publishing/exporttoxmi.html

We can always test it and ask Sparxs for more information on their XMI support.

Best,

Flavio


Email from Achim to srg list of 2019-01-15:

Hi Flavio,

Thank you for bringing this up. This will be a question to be discussed and decided by the new MRT group and the TC.

DDI claims to be a standard. In this sense we should try to use a solution which is open for different usages and different users. We should avoid any dependency of a specific tool. I mean this in the sense that there is a risk that the DDI 4 model can only be used in a chosen UML tool. This might be appropriate to a homogeneous environment like a (large) organization. But the requirement in a standards environment is different.

The DDI 4 model is a library. The library should be offered in a way that the library or subsets of it can be used for many purposes. A chosen tool should not be a barrier.

The Canonical XMI format proved to be able to be imported successfully in many major UML tools. In this sense, the DDI 4 as Canonical XMI would be the portable format. This is useful for people who are using directly the model with different tools, i.e. for generating a representation or combining the model with other models.

The issue with the Canonical XMI format is (currently) that most UML tools don't export Canonical XMI but only a custom XMI flavor. MagicDraw and probably Eclipse are the tools which export a XMI flavor which are closer to Canonical XMI.

A general workaround could be to choose a specific UML tool (or to recommend one) and to write a converter from the specific custom XMI to Canonical XMI. (I did this for the XMI flavor which is exported by Lion).

This way, both would be available, a UML tool for model development and a portable XMI format which can be imported into other UML tools.

Another dimension is the cost issue. Any commercial tool costs something. This might be an issue in the standards environment. Enterprise Architect versions start at 229 USD; additional costs apply to software updates.

Enterprise Architect exports to the UML/XMI version 2.4/2.5. (UML 2.4.1 seems to be the most implemented version, 2.5 is the latest.) The issue is that the exported XMI flavor can't be imported in other UML tools. Only MagicDraw offers a custom import of Enterprise Architect XMI 2.1.

There might be possibilities to get free licenses from commercial tools for standards development. I heard that regarding MagicDraw. The issue here is that companies would probably offer only very few licenses. This way, not the whole MRT group could use the UML tool. Furthermore, other users of the model would need a paid license to use the model.

The free and open-source tool Eclipse Papyrus seems to be a good choice. I suggest to look into the Eclipse UML tools in general. Eclipse UML tools use an own custom format for serialization of models, EMF Ecore (Eclipse Modeling Framework), which is available as XMI. A converter would be required for transforming the Ecore XMI format into Canonical XMI. An additional Ecore serializer could be written for Canonical XMI. This might get support from the Eclipse community.

The whole Eclipse UML tools landscape offers much more. There are tools based on the OMG standards QVT Operational (model to model) and MOFM2T (model to text). These tools would enable a transformation from PIM to PSM, and generation of representation encodings (like XSD and OWL). I looked into this a little. My thinking is described in the attached file.

There are Eclipse tools like EMFStore ("repository to store, distribute and collaborate on EMF-based entities (a.k.a. data or models)") and EMF Compare (comparison and merge facility for any kind of EMF Model). It sounds promising but I didn't have a closer look into this. It would need more exploration.

The DDI 4 model uses only a definitive subset of UML class diagrams. This approach builds the basis to create a robust and easy-to-use model which can be used in multiple environments and which can be represented in multiple encodings (representations). A similar approach should be used regarding the UML tool, i.e. using only core features of the tool. This approach can avoid dependencies.

Cheers

Achim

Provenance and relationship with other tools standards

Email from Flavio to srg list of 2019-01-11

I mentioned in the MT call that we needed to support different types of provenance/lineage. In particular, I'm interested in the so-called why and where provenance. For definitions, please see "Provenance in databases - why how and where - FTinDB 2009" (attached), Sections 1.1.1 and 1.1.3. 

There are many other references, including the original Buneman et al. paper, but this one gives the gist of it. 

Can we represent everything we need to capture these types of provenance with Prov-O or other standards? If yes, how? Else, what is missing?

Food for thought.

Flavio


Email from Jay to srg list of 2019-01-16

Here is a review of several provenance models including the so-called W7 model that Flavio is interested in: http://dcpapers.dublincore.org/pubs/article/viewFile/3709/1932

Here are a couple of articles that align one of these provenance models — PROV-O — with sensor data description and the Internet of Things (IoT):

Sensor Data Provenance: SSNO and PROV-O Together at Last 

Provenance in Systems for Situation Awareness in Environmental Monitoring 

I imagine, in line with suggestions made by Arofan,  that we will want to form a working group on provenance that will make recommendations.

I am thinking that the provenance discussion also raises a larger modeling issue: does DDI intend to copy or plug-and-play other models? We have had this discussion before with Dublin Core. But perhaps we may want to revisit it. That’s because now in DDI 4 we have to decide about SDMX. Because of ongoing UN and EU work, SDMX aggregate data description is sometimes a requirement. In DDI 4 we can continue with the nCubes we currently support in DDI 3.x and perhaps perform an SDMX transformation, we can copy “essential" parts of SDMX or we can perhaps plug-and-play with SDMX. This is really a good use case for thinking about how in the future DDI plans to align itself with other models and standards.


Email from Flavio to srg list of 2019-01-16

Thanks Jay. I need to dig deeper on your references to see whether W7 describes provenance at the datum level. Either way, the first reference is a great summary of approaches.

Regarding your question about whether DDI should copy or plug-in other models, I tend to lean towards the latter. The large number of use cases and vocabularies out there, most of which are under active development, makes it unfeasible for a small team like ours to replicate in DDI. Besides, there is no need to. We just need to understand the use case, the existing vocabulary we'd like to integrate, and create some minimal anchor objects, if necessary, to be able to plug the vocabulary in. I believe DDI should specialize in a small, well-design and well-integrated set of classes to cover the aspects of the data (and metadata) lifecycle that other vocabularies either don't cover or cover poorly.

My five cents.

Flavio


Email from Flavio to srg list of 2019-01-16

Another vocabulary we probably need to integrate with is PMML, for predictive models:

http://dmg.org/pmml/v4-1/GeneralStructure.html

Here is an example of what the model looks like, from a work some folks at StatCan are doing on the health domain:

https://github.com/Ottawa-mHealth/predictive-algorithms/blob/master/CVDPoRT/Reduced/Female/model.xml

Flavio

...