Page Comparison

...

Warning

icon	false

Materials from External Review work at Dagstuhl Sprint

Expand

title	Virtual MRT meeting 2019-01-30

Virtual MRT meeting 2019-01-30

Agenda (as in invite of 2019-01-28)

Goal: To have a draft program of work for NADDI Sprint (to submit to EB), and an initial list of proposed tasks.

Description of what is needed for organizing NADDI Sprint (from Achim, with possible draft planning document?)
Discussion of work topics/tasks sufficient for initial organization/NADDI Sprint

Suggestions:

- Documentation of datum-based model application to examples of event data, aggregates, etc.

Agreed list of data types to be worked on

- Existing modeling technical requirements/issues

Simplification of the model (i.e. less inheritance and less specialized classes)
Review of collections (use of appropriate UML properties, use of collections throughout the model)
Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
Review of views (definition and effective use of subsets of the model)

- Others

Plan for addressing infrastructure tasks (modeling tools, production framework & process, testing groups/liaison, etc.) to support immediate and longer-term tasks

- Are there ideas/candidate tools which need to be written up/further explored?

Current status of production platform post-Berlin

- Identify tester and potential testers

(We may not get this far, but if we have time it would be good)

Minutes DDI4 MRT Virtual meeting 2019-01-30

Attendees: Achim, Arofan, Dan G., Flavio, Hilde, Jay, Larry, Oliver, Wendy

Apologies: Jon

1. Description of what is needed for organizing NADDI Sprint (from Achim):

A goal for the meeting is to have an agreed document regarding the NADDI Sprint planning ready to send to the AG to inform their discussions at their next meeting, and further to apply for funding of the possible sprint.

Achim prepared and sent out the document ‘NADDISprintPlanning.docx’ to the srg list in advance of the meeting. This is a shell document where some content of the document needed to be filled in or reviewed and agreed while the meeting.

The meeting was structured in three parts: 1) Topics for the possible NADDI Sprint; 2) Review of possible participants and funding; 3) Other organizational issues regarding the possible NADDI Sprint.

1) Topics for the possible NADDI Sprint (see point 2 in the agenda)

a) Documentation of datum-based model application to examples of data structures (to be discussed and agreed at the meeting which data structures to focus on at the Sprint).

b) Discussion and possible resolution of structural model issues:

Simplification of the model (i.e. less inheritance and less specialized classes)
Review of collections (use of appropriate UML properties, use of collections throughout the model)
Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
Review of views (definition and effective use of subsets of the model)

Status: Points a) and b) agreed as topics for the possible NADDI Sprint.

Discussion and agreements regarding example data structures a)

Example data structures (point a) were discussed after the structural model issues (point b).

The discussion regarding which data structures to focus on as examples at the possible NADDI Sprint was centered on whether to focus on common vs. complex cases and corner cases.

Dan G. pointed out the importance of modelling complex cases, as more common or simple cases would then be solved at the same time. Others pointed out that issues could occur even if a similar approach is used. Agreement was reached to focus on the common cases as a preparation for the possible NADDI Sprint.

Status: Agreement was reached to focus on the following common data structures for the possible NADDI Sprint:

Rectangular data
Event data (wide and narrow data)
Single datum points
Multidimensional data like data cubes and aggregate data

Arofan and Wendy pointed out that the Variable Cascade documentation (provided for example in the Variable Cascade presentation from the Dagstuhl workshop DDI Train-the-Trainers 2018) indicates the style and level of information needed for documentation.

Status: Wendy will add this as a prototype review comment.

After NADDI further data structures may possibly be explored, for example NoSQL (non SQL) data like Hadoop data, graphs etc.

Status: Agreed

In the Appendix an example from the discussion provided by Larry is found.

Discussion and agreements regarding structural model issues b)

Structural model issues (point b) were discussed before the example data structures (point a).

Conceptual resolution/MRT: Jay brought up the issue if structural issues could be resolved conceptually or by using the MRT approach. Flavio pointed out the need to look at many different examples to check out structural modelling issues. Achim indicated this could be a topic for the possible face to face meeting and something for a work group to focus on in advance.

Complexity of the model: Flavio commented that the model is complex because it is made complex. It has multiple levels and covers both common and domain specific needs. To simplify the understanding, some of the content could for example be hidden for specific user groups.

Achim asks if the model can be improved by focusing on questions like:

What is really the core?
What are the fine-grained details?
What are domain specific things?

Work regarding the complexity of the model could be done in advance and brought to the sprint.

Review of views: The revision of Views is important. Achim points out that even a simple view like the Agency view drives in a lot of classes.

Flavio points out that Views are complex because they currently are designed to cover multiple dimensions. The Classification View is for example meant to cover reuse, classification management and publishing. This and other views would need separation into smaller sets to be easier to understand.

Larry expresses that the model currently is highly connected but that good documentation can help the understanding.

Status: Agreement to focus on the four bullet points under b) above for the planned Sprint. Tasks should be broken up as much as possible. Smaller groups could work on each of those and get back with a proposal for the full group after a week or two. A specific person should be responsible to follow up on the work on each task.

2) Review of list of possible participants and funding

The following agreement was made:

The following people would be available in person for this meeting (their need for funding in parenthesis):

Achim Wackerow (travel, accommodation, food)
Arofan Gregory (travel, accommodation, food)
Dan Gillman (accommodation, food)
Flavio Rizzolo (lives in Ottawa)
Hilde Orten (to be clarified)
Jay Greenfield (accommodation, food)
Jon Johnson (accommodation, food)
Larry Hoyle (accommodation, food)
Wendy Thomas (accommodation, food)

Most of the people would need funding from the DDI Alliance as specified in the NADDISprintPlanning_1_0.docx document.

Oliver Hopt would be available by phone.

3) Other organizational issues regarding the possible NADDI Sprint

Possibilities for meeting location and lodging have been checked out and booked by Flavio and Achim as follows:

Two meeting rooms at StatCan for Tuesday and Wednesday
StatCan is closed on the Monday due to Easter. A hotel can be used for the Monday meeting for additional costs and a room is booked.
12 rooms are reserved at the hotel. The price is a bit higher on Sunday and Monday then on Tuesday and Wednesday, due to Easter.

Two documents are sent to the AG for their feedback prior to their next meeting (also sent to the srg list):

The MRT DDI4 Core proposal document (MRT_DDI4Core_1_0.docx) - sent by Achim on Monday 28^th.
An agreed, updated version of the NADDI Sprint Planning document (NADDISprintPlanning_1_0.docx) – sent by Achim after the meeting on Wednesday 30^th.

Further follow-up is required regarding organizing the start-up of the work, and making plans for what needs to be prepared in advance of the possible NADDI Sprint.

Appendix

Example from Larry related to discussions of point a):

With the ability to describe data at the datum level DDI should be able to describe data like that in the following example through transformations from traditional rectangular (wide) layouts into key-value (tall) representations.

DDI4 can currently describe the data in the wide layout, but, though we have discussed how to do the tall representation, that work has not been completed in the model.

Wide data table:Image Added

Corresponding tall representation: Image Added

Transformations between these layouts are common in data software packages. The SAS code below shows the transformation from the wide to the tall.

Note that in the Tall representation the column Source is a pointer to a variable in the wide layout. The column Value1 is not a traditional variable, in that there is no one value domain or concept associated with the whole column, instead those things depend on the pointer in Source.

If we can properly describe datum level metadata we should be able to describe the value domain and concept associated with the “yes” category label (which is actually a code of 1 in the SAS dataset) in the Value1 column. We should also be able to describe the meaning and units of measurement of the value 185 in the same column.

Proc format;

value yn

1="yes"

2="no"

;

/* example rectangular file */

data fooWide;

input Name $ Height Answer;

label Name="Person name"

Height="height in cm"

Answer="Answer to 'Are you hapy?";

format Answer yn.;

datalines;

Joe 185 1

Mary 160 2

;

run;

proc sort data=work.fooRect;

by Name;

PROC TRANSPOSE DATA=fooWide

OUT=WORK.fooTall(LABEL="Transposed WORK.FOORECT")

PREFIX=Value

NAME=Source

LABEL=Label

;

BY Name;

VAR Height Answer;

format Value1 yn

Expand

title	Virtual MRT meeting 2019-01-23

DDI4 MRT Virtual meeting 2019-01-23

Agenda (as in invite of 2019-01-23)

Goal: This meeting should get us to the point where we are ready to propose a formalization of this effort to the DDI Executive (or take other steps necessary for approval). To that end, the following agenda is proposed:

Agreeing the document as regards:

- Organization

- Scope

- Timeline

Details on organization and scope:

- MRT Lifecycle feedback loop

- Status of the sub-groups (see new version of document, section on organisation and structure, as well as section 4 in the minutes of last meeting).

- Alignment of other standards, provenance (see new version of document, section on alignment with metadata structures in DDI4 Core, and discussions in the Appendix of the minutes of last meeting)

- Finalizing the document and process for approval: Things to be added, changed or removed – or approve new version as is?

Minutes DDI4 MRT Virtual meeting 2019-01-23

Attendees: Achim, Arofan, Hilde, Jay, Jon, Larry, Oliver, Wendy

Organization and scope:

Basis for the discussion: Document updated by Arofan with input from Achim, ‘MRT_DDI4Core_Diff_0_2_and_0_3_JW’, attachment to email from Achim to srg list 2019-01-23.

The goal of the meeting was to finalize the MRT-DDI4 Core document to be sent to the Advisory Group for their comments.

Jay suggested to start to plan work tasks as well at this meeting.

Status: Agreement to take on tasks later on, and to prioritize the finalization of the document at this meeting.

Discussions regarding the document:

-Organisation and Structure: Achim points out the Core group guides the whole effort, defines sub-tasks and assigns a responsible person for the task who reports back to the full group. The feedback loops should be done in short, iterative cycles. Tasks are not long term, and should be discrete and well-defined. Important that the document reflects this.

Status: Agreed

-Role of the MRT in the organization: Wendy asks how the MRT relates to other DDI groups. Larry points out that this is a new way of organizing the work of the Modelling Team.

Status: Agreement that MRT replaces the Modelling Team. The work of the group should be well aligned with the Advisory Group.

-MRT feedback loop: The requirements for the feedback loop were discussed. Achim points out that the requirements are just a repetition of earlier goals of the Moving Forward project.

Proposals for amendments to bullet points (for clarification purposes):

-Remove ‘if required’ from the ‘looseless roundtrip’ bullet point.

-Add ‘Stability’ to the ‘Consistency’ bullet point.

-‘Persistence of the model’ change to ‘Persistent expression of the model in canonical form’ (not to be confused with canonical XMI).

Status: Agreement to update the bullet points accordingly.

-Mapping of DDI4 to earlier versions of DDI: This was discussed at our last meeting (January 16^th). Larry points out that conformance and divergence with previous versions of DDI should be clearly defined.

Status: Agreed to include a section on this in the document.

-Alignment with other metadata standards: Jay asks if SDMX should be mentioned in the list of standards included in the document. Arofan points out that the document indicates ‘at least’ which standards the DDI4 Core should be interoperable with.

Status: Agreed to highlight ‘at least’ in the document.

-Production process: Jon asks if the production process should be mentioned in the document. Arofan points out that this is a big and important topic that needs to be addressed. We will need to come back to what the options are.

Status: Agreement that the production process should be identified in the document as something that would need to be addressed.

Timeline:

-Timing of the DDI4 Core work.

Status: Agreement for the DDI Core work to take place in rapid cycles of weeks, not months. A calendar year is the anticipated goal. Leave wording in the document as stands.

-Timing of the finalization of the document: Wendy, Achim and Arofan points out the importance of finalizing the document before the next meeting of the Advisory Group (scheduled to next Wednesday).

Status: Agreed to finalise the document for Monday and send to the AG for their comments. Arofan send to the MRT group today (on the 23^rd) for comments.

Other:

NADDI Sprint: Achim proposes a face-to-face meeting with the group after NADDI, of possibly three days, and asked if people in the group would be interested in this. All participants on the call indicated that they would be interested. Their possibilities for attendance and dependencies are specified below:

Achim (needs funding from the DDI Alliance)
Arofan (needs funding from the DDI Alliance)
Dan G.
Flavio (will hopefully be able to attend – he lives there)
Hilde (needs funding from NSD)
Jay (needs room support from the DDI Alliance – will cover his own travel)
Jon (depending on acceptance of abstract for the NADDI conference).
Larry
Oliver (will attend virtually)
Wendy (needs funding from the DDI Alliance)

Possible topic for the agenda of the NADDI Sprint: Flavio proposes to focus on the modelling requirements.

Status: Agreement that Achim follows up regarding the possible NADDI Sprint with the AG, contact the local organisers regarding possible localities etc.

Expand

title	Virtual MRT meeting 2019-01-16

Agenda (as in meeting notes from of 2019-01-09)

Organizational approach of where we are going and how we organize the approach
Scope, focus, groups, approach proposal
Who else needs to be recruited to make this functional
What is the approval process
OUTCOME: draft for approval

Modeling technical requirements - need to provide a summary for comprehension
Platform questions - approach to addressing this

SPARKX cloud modeling approach for UML modeling https://www.sparxsystems.com.au/enterprise-architect/cloud-services/cloud-services.html
Question of production process - where this fits
XMI output - canonical approach

Minutes DDI4 MRT Virtual meeting 2019-01-16

Attendees:

Achim, Arofan, Dan G., Flavio, Hilde, Jay, Larry, Oliver, Wendy

Topics:

At the meeting the organizational approach was discussed as described below.

UML modelling tools was discussed by email between the last meeting and this meeting by Flavio and Achim (see information under 8) below as well as the full correspondence in the appendix).

Provenance issues and relationship with other models were also discussed between the meetings by Flavio and Jay (also under 8) below and included in the appendix).

Organizational approach:

Basis for the discussion:

A) Document ‘MRT_DDI4Core_0_2’, attachment in email from Arofan to srg list 2019-09-01.

B) Achims questions to be clarified in order to build a good basis for the work next year, in email from Achim to srg list 2018-12-12. Achims questions (1 – 8) and their workflow status are specified below:

Is there an agreement that on Modeling, Representation, and Testing replaces the existing Modeling Group?

Status: Agreed

2. Focus on DDI 4 Core, like Conceptual, Data Description, and Process. These areas are important for any use case perspective. Additional areas can be identified according to business requirements. But the focus on a core increases the chances to have a robust and mature deliverable.

Status: Agreed to focus on the core

3. Description of major tasks regarding major modeling issues

-Provenance was brought in as a new topic in a discussion between last week’smeeting and this meeting, see point 8) and the appendix.

Status: Needs follow-up discussions

4. Participants and their roles/perspectives

-Proposal in the ‘MRT_DDI4Core_0_2’ document to have an administration group (coordination team) with sub-groups.

Achim suggests that smaller groups can work independently on different things and get back to bigger groups with recommendations.

Status: Agreed

-Participants of the MRT coordination team:

Arofan suggests that this group is the MRT group coordinating team, together with Jon who has also expressed interest in this.

Status: Agreed

-Sub-teams: In the ‘MRT_DDI4Core_0_2’ document, three possible sub-teams are proposed, all with an identified lead: a modelling sub-team, a representation sub-team (with sub-teams for each representation, xml, RDF, Phyton etc.), a documentation sub-team and a testing and representation sub-team.

Larry expresses a concern for the idea of fixed sub-groups due as there may not be enough people for this.

Achim proposes to think in terms of more ad hoc task oriented sub groups. Invite external experts when needed.

Status: Needs follow-up work

-Perspectives:

A task proposed by Larry that also regards modeling (question 3 above) is to have DDI2 mapped into the DDI4 Core for the end of the year.

Comment from Achim: This work can identify missing pieces and flaws in the modelling. Not sure if all can be resolved by the year, but should be possible to identify them. Do mapping first and then modelling.

Comment from Arofan: Transformations should be developed.

Flavio: A modelling tool should be used rather than excel for the mapping.

Status: Agreement about the requirement about the mapping of DDI2 to DDI4. Needs follow-up work.

Task proposed by Achim: Work on data description usage for different representations and data forms, for example unit record data, event long/short, aggregate/cube, single datum in a lake. Detailed tasks should be developed.

Example data that can be used for this purpose are data from the Australian Election Study (Larry) and the ESS (Achim), the Alpha Network (Jay), possibly others. The Alpha Network has lots of different data types. Several of the relevant data sets are structured in DDI2. Where real data are difficult to provide or new, like a datum in a lake, made up examples should be provided.

Status: Agreement on the task. Needs detailing of sub-tasks , follow-up work, who’s involved etc..

Tools: Flavio: Need to decide which tools to use in our work (see also discussions under 8) and discussion emails in the appendix).

Status: Needs follow-up

-Administrative work:

Hilde does the meeting minutes

Status: Agreed

Further administrative work, chairing etc.

Status: Needs follow up.

5. Is the proposed timeline for a DDI 4 Core at end of 2019 reasonable?

Status: Agreed working goal.

6. The development of the business requirements document can be worked on in parallel but is not task of this group.

Comment from Arofan: Some requirements identified by this work that affect the core areas could be interpreted as technical requirements and fed back to the MRT group and be a task for the modelers.

Comment from Achim: The focus of the MRT approach is the goal of a stable DDI4 core in one year. The focus on business requirements should not be tied to close to the task in order not to delay this process. It would be important to distinguish between what we need to have in to have a functional DDI4 core, and what can be added later.

Comment from Flavio: Agrees with Achim. For each step in the MRT cycle it should be decided what it would make sense to include. The MRT Coordination group should decide on this.

Status: Achims proposal agreed.

7. Information on these agreements to other groups and DDI Alliance committees

A goal is to finalize a document that takes into account the agreements made regarding the MRT approach. The document about agreements which will represent our proposal for the DDI4 core, will be developed and sent to the SB and EB for their approval, as well as to other groups.

Achim and Wendy: The timeliness of this document should be decided on the basis of decisions regarding the work of the group.

Status: Agreed

8. Identification of issues which can be worked on in the next couple of weeks independently of group meetings.

- To do for the next meeting (2019-01-20)

Hilde: Will post the minutes

Arofan: Will prepare agenda for next week with input from others and send out invitation to the meeting to the srg list with agenda and meeting link prior to the meeting.

All: Think about the open issues from this meeting.

- Contributions since last week’s meeting (2019-01-09)

UML tools discussions between Flavio and Achim:

In an email to the srg list of 2019-01-10, Flavio points out that we need to decide on a platform for developing, managing and sharing UML models. He proposes to use EA Sparx. The Canonical XMI support needs to be checked out. As a response to this Achim replies in an email to the srg list of 2019-01-15 that this topic needs to be discussed in the MRT and TC groups. Achim suggests to use an open tools solution, on the background that DDI is a standard, and we cannot risk that a DDI model can be used only in one tool, costs can be an issue regarding commercial tools etc. The Canonical DDI4 XMI has proved importable by many different UML tools. The problem is that most UML tools provide a custom XMI flavor rather than canonical XMI. Achim recommends to look into Eclipse UML tools, which have an XMI flavor that more easily can be bridged to canonical XMI than many other tools. Bridging might be supported by the Eclipse community. The Eclipse tools can do many other different things, for example to enable transformations from PIM to PSM.

The slide below provided by Achim that describes possible usages of Eclipse tools for MRT purposes. See the email conversation in the appendix below for the full argumentation.

Image Added

Status: Should be looked further into.

Provenance/lineage and relationship with other standards discussions between Flavio and Jay:

In an email to the srg list of 2019-01-11 Flavio brings up the question of supporting different types of lineage/provenance, and asks if everything we need to capture can be captured by Prov-O or if different standards are needed.

Jay replies to this by providing references to a review of several provenance models, and some articles. Jay proposes further to form a provenance sub-group to look into this.

He also raises the issue of DDI should copy or plug and play with other models, for example SDMX. In two different emails to the srg list of 2019-01-16, Flavio replies that he believes there would not be resources available to do the plug and play with standards, and he believes that DDI should specialize in a small, well-design and well-integrated set of classes to cover the aspects of the data (and metadata) lifecycle that other vocabularies either don't cover or cover poorly. See the full discussion in the emails of the appendix.

Status: To be discussed

Appendix

Email correspondences between meetings between Flavio and Achim on UML tools and Flavio and Jay on provenance and relationship with other tools:

UML Tools:

Email from Flavio to srg list of 2019-01-10

Hi Achim,

We need to make a decision on a platform for developing, managing and sharing UML models. A well-know tool for that purpose is EA Sparx -- We use it at StatCan and in some UNECE HLG projects, e.g. GSIM, CSPA.

Sparx allows users to create arbitrary views on-the-fly by dragging and dropping objects from the underlying model. This way it is possible to deal with smalls subsets of the model at a time during development, and also to target different audiences for communication purposes. It also supports BPMN. The model can be exported to multiple programming languages.

Here you will find some pricing information for the cloud version:

https://sparxsystems.com/products/procloudserver/purchase.html

This is for stand alone licenses:

https://sparxsystems.com/products/ea/shop/index.html

I guess the big technical question, beyond design capabilities, is which version of XMI is supported to make sure we can import/export the model to other platforms. There is some information here, although not conclusive:

https://www.sparxsystems.com/enterprise_architect_user_guide/13.5/model_publishing/exporttoxmi.html

We can always test it and ask Sparxs for more information on their XMI support.

Best,

Flavio

Email from Achim to srg list of 2019-01-15:

Hi Flavio,

Thank you for bringing this up. This will be a question to be discussed and decided by the new MRT group and the TC.

DDI claims to be a standard. In this sense we should try to use a solution which is open for different usages and different users. We should avoid any dependency of a specific tool. I mean this in the sense that there is a risk that the DDI 4 model can only be used in a chosen UML tool. This might be appropriate to a homogeneous environment like a (large) organization. But the requirement in a standards environment is different.

The DDI 4 model is a library. The library should be offered in a way that the library or subsets of it can be used for many purposes. A chosen tool should not be a barrier.

The Canonical XMI format proved to be able to be imported successfully in many major UML tools. In this sense, the DDI 4 as Canonical XMI would be the portable format. This is useful for people who are using directly the model with different tools, i.e. for generating a representation or combining the model with other models.

The issue with the Canonical XMI format is (currently) that most UML tools don't export Canonical XMI but only a custom XMI flavor. MagicDraw and probably Eclipse are the tools which export a XMI flavor which are closer to Canonical XMI.

A general workaround could be to choose a specific UML tool (or to recommend one) and to write a converter from the specific custom XMI to Canonical XMI. (I did this for the XMI flavor which is exported by Lion).

This way, both would be available, a UML tool for model development and a portable XMI format which can be imported into other UML tools.

Another dimension is the cost issue. Any commercial tool costs something. This might be an issue in the standards environment. Enterprise Architect versions start at 229 USD; additional costs apply to software updates.

Enterprise Architect exports to the UML/XMI version 2.4/2.5. (UML 2.4.1 seems to be the most implemented version, 2.5 is the latest.) The issue is that the exported XMI flavor can't be imported in other UML tools. Only MagicDraw offers a custom import of Enterprise Architect XMI 2.1.

There might be possibilities to get free licenses from commercial tools for standards development. I heard that regarding MagicDraw. The issue here is that companies would probably offer only very few licenses. This way, not the whole MRT group could use the UML tool. Furthermore, other users of the model would need a paid license to use the model.

The free and open-source tool Eclipse Papyrus seems to be a good choice. I suggest to look into the Eclipse UML tools in general. Eclipse UML tools use an own custom format for serialization of models, EMF Ecore (Eclipse Modeling Framework), which is available as XMI. A converter would be required for transforming the Ecore XMI format into Canonical XMI. An additional Ecore serializer could be written for Canonical XMI. This might get support from the Eclipse community.

The whole Eclipse UML tools landscape offers much more. There are tools based on the OMG standards QVT Operational (model to model) and MOFM2T (model to text). These tools would enable a transformation from PIM to PSM, and generation of representation encodings (like XSD and OWL). I looked into this a little. My thinking is described in the attached file.

There are Eclipse tools like EMFStore ("repository to store, distribute and collaborate on EMF-based entities (a.k.a. data or models)") and EMF Compare (comparison and merge facility for any kind of EMF Model). It sounds promising but I didn't have a closer look into this. It would need more exploration.

The DDI 4 model uses only a definitive subset of UML class diagrams. This approach builds the basis to create a robust and easy-to-use model which can be used in multiple environments and which can be represented in multiple encodings (representations). A similar approach should be used regarding the UML tool, i.e. using only core features of the tool. This approach can avoid dependencies.

Cheers

Achim

Provenance and relationship with other tools standards

Email from Flavio to srg list of 2019-01-11

I mentioned in the MT call that we needed to support different types of provenance/lineage. In particular, I'm interested in the so-called why and where provenance. For definitions, please see "Provenance in databases - why how and where - FTinDB 2009" (attached), Sections 1.1.1 and 1.1.3.

There are many other references, including the original Buneman et al. paper, but this one gives the gist of it.

Can we represent everything we need to capture these types of provenance with Prov-O or other standards? If yes, how? Else, what is missing?

Food for thought.

Flavio

Email from Jay to srg list of 2019-01-16

Here is a review of several provenance models including the so-called W7 model that Flavio is interested in: http://dcpapers.dublincore.org/pubs/article/viewFile/3709/1932

Here are a couple of articles that align one of these provenance models — PROV-O — with sensor data description and the Internet of Things (IoT):

Sensor Data Provenance: SSNO and PROV-O Together at Last

Provenance in Systems for Situation Awareness in Environmental Monitoring

I imagine, in line with suggestions made by Arofan, that we will want to form a working group on provenance that will make recommendations.

I am thinking that the provenance discussion also raises a larger modeling issue: does DDI intend to copy or plug-and-play other models? We have had this discussion before with Dublin Core. But perhaps we may want to revisit it. That’s because now in DDI 4 we have to decide about SDMX. Because of ongoing UN and EU work, SDMX aggregate data description is sometimes a requirement. In DDI 4 we can continue with the nCubes we currently support in DDI 3.x and perhaps perform an SDMX transformation, we can copy “essential" parts of SDMX or we can perhaps plug-and-play with SDMX. This is really a good use case for thinking about how in the future DDI plans to align itself with other models and standards.

Email from Flavio to srg list of 2019-01-16

Thanks Jay. I need to dig deeper on your references to see whether W7 describes provenance at the datum level. Either way, the first reference is a great summary of approaches.

Regarding your question about whether DDI should copy or plug-in other models, I tend to lean towards the latter. The large number of use cases and vocabularies out there, most of which are under active development, makes it unfeasible for a small team like ours to replicate in DDI. Besides, there is no need to. We just need to understand the use case, the existing vocabulary we'd like to integrate, and create some minimal anchor objects, if necessary, to be able to plug the vocabulary in. I believe DDI should specialize in a small, well-design and well-integrated set of classes to cover the aspects of the data (and metadata) lifecycle that other vocabularies either don't cover or cover poorly.

My five cents.

Flavio

Email from Flavio to srg list of 2019-01-16

Another vocabulary we probably need to integrate with is PMML, for predictive models:

http://dmg.org/pmml/v4-1/GeneralStructure.html

Here is an example of what the model looks like, from a work some folks at StatCan are doing on the health domain:

https://github.com/Ottawa-mHealth/predictive-algorithms/blob/master/CVDPoRT/Reduced/Female/model.xml

Flavio

Expand

title	Virtual meeting 2019-01-09

ATTENDEES: Arofan, Wendy, Flavio, Larry, Dan G., Jay

Technical Business requirements:
Model needs to do:
- Business requirement need to go into details of how to use the standards and how to use them
- Lineage and permanance (data point and record level)
- Provenance capture - why, how, and where provenance of a datum
-- each individual cell and the content of it as well as the record level, set level
-- how the data was changed by analyst such as due to a government shut down or change in algorithm
- This needs to be elaborated because this is not a specific business area requirement but a general need
- Use case - BLS is facing this issue and Dan would like to work with this also
- Jay has been using process model plus SDCL with ALPHA at the datum level through data set
-- building out that capability - idea is to see how well we can map DDI4 Data management view to PROV-O
-- PROV-O is very generic and so seems to work best at the higher data set level so it may be useful to provide more detail
-- Transformation between variables (source, observation, transformation)
Meeting with this topic and of drilling down specifically into the model to see how we get down to a datum or up from a datum

Technical Modeling Style requirements

Organizational design
- Arofan's documents
- proposal is to create a group that replaces the old modeling team plus a coordinting team
-- modeling - technical requirements handling which feeds into representation groups
-- for each representation
-- liason team (working with projects doing testing) - business requirements
-- documentation
Example: We have a business requirement regarding data lineage in StatsCan which would be a liason issue feeding into modeling team

Patterns to help in the tooling so to reflect the pattern base in the representations
Patterns need to be discussed from a technical point of view and the business point of view (implementers and content managers have different needs)

Arofan's email summarizing business requirements

I volunteered on the just-ended call to send out an e-mail regarding the business requirements activity which came out of the Berlin Sprint discussions. Until we organize more formally, this will just be a topic on this (the SRG list) and perhaps we can schedule a call if needed. (I have access to a WebEx so we can do meetings without conflicting with normal DDI calls if that helps).

I would like to summarize the things that I am aware of which are relevant to pursuing the creation of a business requirements document which we can put forward for agreement:

(1) I wrote a high-level document of feedback from the Cross-Domain (second) Dagstuhl week, during the Sprint. Jon has taken this and started extracting some basic scope for framing up actual business requirements.

(2) Flavio has volunteered to document some of the business requirements from an official statistical perspective. Jay, as the DDI liaison to UN/ECE, has offered to help him with this.

(3) Kelly identified during the Sprint that the Prototype feedback in fact contains business requirements, which we will need to surface into whatever document we create, at some point.

(4) Wendy asked on today's call that we identify any areas where we think there might be dependencies/points of contact between this work and other efforts within the MRT work that is now shaping up. I can see that there will be some - certainly the requirements for the business purpose served by the UML model itself (and the style of it) are already on the table, from the Daghstuhl feedback document. I am sure there are many others. We will need to focus on this as we move forward.

(5) It was also suggested on the call (was this Jon?) that the whole MRT proposal, along with this parallel effort regarding business requirements, needs to be presented to the leadership of the Alliance for approval/discussion. This suggests that we may need to create a very short document describing what this activity is and why we see it as important.

I am sure other things may be going on in regards this work which I have not mentioned above - please add anything you see as important.

I think it is early days to organize a call, especially with the holidays approaching, but we should at least try to figure out how best to move forward in the interim. One major question is finding out who wishes to be involved. The names mentioned above are clearly interested (Jon Johnson, Jay Greenfield, myself, Flavio Rizzolo. hopefully Kelly as project manager) but who else would like to get involved? I am sure this is a broader group.

If you are interested in helping frame business requirements - not technical requirements - for the DDI 4 work, please respond to this e-mail.

Also, if people think that an organizing discussion before the holidays would be useful, please speak up. We can easily arrange something.

Next weeks agenda:
Organizational approach of where we are going and how we organize the approach
Scope, focus, groups, approach proposal
Who else needs to be recruited to make this functional
What is the approval process
OUTCOME: draft for approval

Modeling technical requirements - need to provide a summary for comprehension
Platform questions - approach to addressing this

SPARKX cloud modeling approach for UML modeling https://www.sparxsystems.com.au/enterprise-architect/cloud-services/cloud-services.html
Question of production process - where this fits
XMI output - canonical approach

Expand

title	Virtual meeting 2018-12-12

ATTENDEES: Wendy, Larry, Dan G., Arofan, Achim, Hilde, Jon, Flavio, Kelly, Jay

Proposal regarding an MRT group to replace or expand the MT
--Needs to be approved as a structure
--Relationship of proposed group with working groups (Data Description, Data Capture, etc.)
--This group can make a proposal to the AG so they can discuss
--Business requirements needed
--If someone from the core MRT team is in contact with a testing team
--Role of each individual needs to be identified
--Arofan thinks he may be able to be more involved, everyone should think about their participation and role, Hilde may be able to join
--At a governance level there is the issue of focusing on the core and whether a year is reasonable. Needs input from the Scientific Board. This is a bit of a re-boot and need to clarify goals for 12 months.
--Prepare to bring into the broader Scientific Board prior to the May meeting
--Focus on short term goals / sprint like
--Get a document out in the near future to
--Making sure that response to Prototype review filters into this
Business requirements document
--Keep as a separate parallel activity for the time being
--Start and then continue detail into February/March
--Having an outline of these requirement should be part of the proposal
Technical requirements - collections discussion, attributes, development process
--Roundtripping, modeling, etc.
Modeling rules for UML - this is needed to define the input and validation to COGS
--This needs to be well thought out in terms of what is the "core of the core" and the principles of what we are expressing in UML
Addressing issues raised in Prototype review and assigned to the MT

What are the real next steps in December and January. Next meeting January 9 (status check) first working meeting January 16
Interest and role in Modeling (MRT) Team should be requested on MT mailings to list over the next month or so. Something on broader list as proposal for group is expanded
What is the goals of the group document should be ready for broader distribution
Technical requirements - Flavio and Wendy
Business requirements - Arofan
Summary of roles etc. - Arofan
Project management - Kelly is discussing with Jared this afternoon, how is project management to go on with this group, we need to start determining dependencies, time requirements, resources should be part of how we work

Email from Achim prior to meeting:
In my understanding following questions should be clarified and would build a good basis for the work next year:

1. Is there an agreement that on Modeling, Representation, and Testing replaces the existing Modeling Group?
2. Focus on DDI 4 Core, like Conceptual, Data Description, and Process. These areas are important for any use case perspective. Additional areas can be identified according to business requirements. But the focus on a core increases the chances to have a robust and mature deliverable.
3. Description of major tasks regarding major modeling issues
4. Participants and their roles/perspectives
5. Is the proposed timeline for a DDI 4 Core at end of 2019 reasonable?
6. The development of the business requirements document can be worked on in parallel but is not task of this group.
7. Information on these agreements to other groups and DDI Alliance committees
8. Identification of issues which can be worked on in the next couple of weeks independently of group meetings. People will have longer breaks (I’m not available Dec 22 to Jan 13).

The intention is here to find a common ground on which basis productive work can be done.

Modeling Team was on hiatus while the Technical Committee prepared the DDI4 Prototype for review

Expand

title	Virtual meeting 2018-03-21

ATTENDEES: Kelly, Wendy, Jay, Larry, Hilde

Agenda:

General Updates on content/files - Kelly et al (10 mins):

Do we have links to everything that has been written
What is currently in Bitbucket in terms of high level content
Documentation of production process and interface with view documentation are two different things
Need compiled document on Views - Modeling, Lion to PIM/PSM, Binding specification
Problems arise in different points in the production process and we need to determine what needs to be dealt with when
Pull together the Modelers documents on Views
Documentation of XML binding of Views - Oliver will update at end of week
Add examples from Sprint page
FHIR documentation https://www.hl7.org/fhir/index.html

reStructuredText examples - Kelly (10 mins):

https://bitbucket.org/ddi-alliance/ddi-views/src/a2c8a3d9fce4d18af096467534ac6a0718e766b8/documentation/src/userguides/variablecascade.rst?at=master&fileviewer=file-view-default

Hilde Questions - Hilde (25 mins):

Issues came up about the model (UML) and views
What should be in the View? file issues if items should be added/removed from view
Images can be instructive

Expand

title	Virtual meeting 2018-03-07

ATTENDEES: Kelly, Wendy, Jay, Larry, Oliver, Dan

Modifications due to model changes in documentation:
--reflecting change documentation - name change list TC-41
--Views seem to be less stable
--View documentation on two levels (restricted classes needs that level of understanding to understand schema file or if doing RDF need to
--How do we flag what needs to be reviewed due to changes in model (name changes, content of Functional Views)
--ACTION: Add a flag to the change log to notify where additional documentation review should take place (Wendy - DONE)

Content and format updates
--Get Flavio to review earlier documentation to update
--Reviewed assignments for various documentation objects (see DDI Prototype Documentation MASTER)
--Linking between documents
--There are two ways in Lion (external paste in HTTP link; link from one section in the documentation to another http://docutils.sourceforge.net/0.6/docs/user/rst/quickref.html "inline internal hyperlink targets)
--ACTION: provide and example (Kelly/Jon)

Expand

title	Virtual meeting 2018--02-21

ATTENDEES: Wendy, Jay, Larry, Oliver, Kelly

Kelly will present her organization of work so we can see where we are plugging in
--Folder in Google Drive that will contain prototype documentation where it can be worked on and edited (link on project management page)
https://drive.google.com/drive/folders/1-R_Zt_ECCkJmACJnewh9cNP4d9tiAfvG
--Master spreadsheet to track work on Google site
--May need to add more examples
--Can we review Dagstuhl documents for other documents that should go into the folders
--Future work will be with the documentation group but we need to pull in the documents we have
--Make sure all classes in the packages have complete and accurate
--Oliver will create a spreadsheet with all the classes and their documentation
(this will become part of the nightly build so we can pull out and see where we are at a given point)
--Review sheet for specific assignments or for volunteering

Some rules on what type of issues should be filed where so they get addressed by the right group
--Use prototype ONLY for things that have to happen for the prototype
--modeling issues should be filed in TC; Documentation in DOC; RDFOWL, XMI, XML etc in appropritate trackers, and if it needs to be done by Prototype then add a tracking issue in Prototype

Class level documentation as much as possible in next two weeks

Oliver will check on the transformation issue - XML rendering of regular expressions

Expand

title	Virtual meeting 2018-02-14

ATTENDEES: Wendy, Larry, Jon, Oliver, Kelly

Role of dual TC/MT member is TC model reveiw - look at where you can best contribute
TC-3
TC-6

Meeting schedule through June - We'll meeting next week and the leave scheduled for every other week if needed to verify what has been done, what's being worked on etc.

Issues found during write-ups for TC
TC-7
DVG-27 (2 newest documents) - don't know just how these are used - how the high level connects to the detail

Want to make sure what we have is the most accurate

Documentation issues - review documents for current accuracy and move to DVG-27

DVG-27
DMT-176
DMT-173
DMT-172
DMT-171
DMT-168
DMT-162
DMT-155
DMT-147
DMT-145
DMT-137
DMT-118
DMT-115
DMT-100
DMT-97
DMT-84
DMT-83
DMT-80
DMT-72
DMT-23
DMT-18

Use cases / examples

DVG-28
DMT-182
DMT-154

Expand

title	Virtual meeting 2018-02-07

ATTENDEES: Wendy, Jay, Larry, Dan, Oliver, Kelly

Agenda briefly:
--replacing targets that are pattern classes (mostly related to methodology pattern)
--if we subjectOfDesign ECVE can we set it to a specific value - document that these should match
--extend from methodology overview
--ACTION: make classes of SimpleMethodologyOverview the extension base for other realizations of these classes. Note that this has made several classes with NO additional properties/relationships whose sole use is to limit the target class of a relationship to a specific subtype DONE
--a few quick include/don't include decisions for Descriptive Codebook
--ConceptualInstrument - ImplementedInstrument is the cutoff with documentation that this is where it would link into the DataCaptureInstrument - also keeps in line with what is covered in 2.5
--ACTION: add ones in red make changes to pattern targets DONE
--ACTION: add hasVariableCollection to Study to tie in the use of requested variable collection DONE
--documents needed by documentation group
--DVG-27 is the site for dumping any and all documents, drafts, notes to feed into higher level documentation
--created DVG-28 as a site for dumpling use cases, test cases, and examples
--class documentation relation to GSIM and there are lots of issues between DDI 4 and GSIM
--Jay is working with Jason Blackwell (UNECE) on a document covering conceptual and logical models - this should be at least referenced by DDI documentation to help clarify the relationships

what does it mean to "End" at a certain point in, for example, the variable cascade
X1) include the class i.e. InstanceVariable but don't add any un-included relationship targets

2) don't include the class i.e. InstanceVariable and document that this is the linking point to a larger range of packages

Expand

title	Virtual meeting 2018-01-24

ATTENDEES: Wendy, Jay, Oliver, Larry, Kelly, Dan

RectangularLayout becomes UnitSegmentLayout

Larry made change and will follow-up with documentation search and fixes

If we are going down to the datum and data point we are missing describing the Set of Units. We can say what the population was but we don't have a means of subsetting by definition. Want to lay out the issue. How would this relate to where VariableStatistics would need to be attached. Possibilities would be use of IdentiferViewPoint, Transformation Processes, possible of creating an Index or other means of addressing this.

We want to be clear on what the prototype does, what it doesn't do, and issues.

How far up the Variable Cascade
List of Functional Views --
Conceptual Content View : Concepts --- InstanceVariable
Data Description View : Datum --- InstanceVariable
Data Capture Instrument View : Capture --- RepresentedVariable
Custom Metadata View : CustomXX --- InstanceVariable
Statistical Classification View
Structured Geography View
Agent Registry View
Sampling View
Data Management Process View : Include Business Workflow and look for the edges
Descriptive Codeview View : current content and coverage

Use the Variable Cascade as a central hub of where everything plugs in thereby facilitating connection to different parts. Model hangs together how extensions can plug in (different capture modes, different storage modes, etc)

Workflow work is now stable - Business Workflow

Expand

title	Virtual meeting 2018-01-17

ATTENDEES: Wendy, Jay, Larry, Oliver, Dan, Kelly

Data Description:
Logical Description - final?
Format Description See DDI4DATA-25

Rectangular - does it also include a CSV as long as all lines have the same amount of columns
Rectangular being all records of the same type and same layout
Rectangular was fixed length

SingleLogicalRecordFile - Flat? UnitRecordFile?
Single logical record - could be multiple physical segments
Can be fixed length OR delimited

MulitpleLogicalRecordFile - Hierarchical/Relational

Two things:
How many logical records are in the file
Is it fixed or delimited?
If multiple logical records - hierarchical or "rectangularized"

FlatSegmentLayout
Single type - can have multiple physical segments each must belong to a single logical record (segments are flat within their logical record)

Prototype -
Multiple logical record types
Multiple segments (that can be ordered)
Association between logical records

Cube - dimensional store
Event are other layout - Tall skinny file

ACTION:
Finalize a name of this particular physical record layout
Get a description - of this and vocabulary below
Logical and format we have what we need - double check examples

Vocabulary agreement:
What is the physical layout of a logical record (can have multiple segments)?
PhysicalSegmentLayout
A collection of physical layout formats in a single file?
PhysicalFile
A collection of closely related files (like in a relational data base)?
PhysicalDataSet
A collection of multiple files of many types within a single store?
DataRepository

ACTION:
Change name of property fileName to physicalFileName

Items 1 and 2 under Agreed are agreed
Item 3 collection of PhysicalSegmentLayouts format a Logical Record

Under Questions:
1 and 2 were agreed to
3 - working on the specific name and documentation
4 - resolved above renamed to PhysicalDataSet

Project Management Question:

How does verification of use cases move into the documentation?
Each use case has description, examples etc. how to file these for incorporation for the documentation group.
Making a list of use cases, examples, what gets handed off and contributed to during documentation period.
DMT-176 place for class documentation
Distinction between test cases and use cases - test cases are useful for describing how things work and testing out for implementing - use cases are broader (ANES, Transformations, etc.)
Use cases relate to what is covered by a Functional View
Make sure there is a place where content for pass off to next group is collected - will add issues to DVG

Expand

title	Virtual meeting 2018-01-10

ATTENDEES: Wendy, Jay, Jon, Oliver, Larry

LOGICAL and FORMAT
Issue of data description (logical and physical) - what changes are still possible?
What is a tweak and what is a rework?

When Jay looks at the model now and the bindings Deirdra has been working with are out of synch and there are still some issues to finalize before entering the changes in Lion. There are lots of collections that bounce up against each other and this is being simplified.

Jay has been sending out models of the agreements. He has been testing this model and making examples. DMT-176

One of the big changes we decided that Viewpoints hung off the Unit Record Relation Structure. There are no Viewpoint relation records
Wendy will enter changes in Lion based on the most recent ppt in DMT-176

Simplification resulting from mechanical entry of collections - showed up a number of duplications of activity. These have been cleaned up.

Language of object - DMT-177
Often need to support several languages which is a problem in the XML as an attribute can't be repeated.
Solution has been to create a list of xs:language
Changes were initially made in Lion. These need to be updated in terms of documentation and cardinality. This allows the transformation to be based on the use of the datatype.

Move CDE to separate package
Move Catalog items to separate package
Evaluate time needed to update CustomVocabulary - create a meaningful view - need a use case creating Controlled Vocabulary publication option

Workflow/Process -
2nd view to support transformation
Look at GSIM business process to see what is covered/what not - BTL - GSIM information model
Main difference now has to do with collections vs nodes

NEXT MEETINGS

Get data description nailed down in the next 2 weeks. Need to finish views and enter them by end of January 31.

Jay and Larry will talk about how Format relates to logical and present next week.

Expand

title	Virtual meeting 2018-01-03

ATTENDEES: Wendy, Larry, Jay, Dan, Oliver

DMT-182: Format structure issues raised in working on Use Cases. Data Description issue will discuss at next meeting.
Get resolved by mid-January for update prior to end of month

Darrin's example - can Jay see what you're doing - wendy email get his RDF

Wendy will go through remaining views and potential view and draft for discussion

DMT-148 - looked at specific issues (skip those in Qualitative)
Raised the issue of Common Data Element - difference with RepresentedVariable
Need to determine if this should be in Prototype before the end of January

Final review piece is to identify those ComplexDataTypes that are total orphans - find them and isolate - Oliver will write script for validation of this problem

Script to identify orphans in general - with package information - Oliver

ISSUES for resolution before end of January

Extensions to Workflow for Prototype - Jay will gather information, we'll create issue and determine what needs to be discussed

Add lists of classes needing documentation in other packages in Prototype

Expand

title	Virtual meeting 2017-12-20

ATTENDEES: Wendy, Jay, Larry, Dan, Oliver, Kelly

Workflows:
Results of Jay's review of workflows as a basic realization of Process. Need to capture transformation as ETL's rather than statistical packages. DDI4 metadata in workflows used to try out and test work in PENTAHO. Using this to identify any new or modified classes for workflow. ComputationAction is one of the subtypes of Act (others are related instrument components for a data capture instrument). It can capture code. What is needed is a means of capturing a clear structured description in XML. The approach being used instead of capturing code, is the creation of XML that is fed into a machine to run the recode.

PENTAHO etc. start with metadata and produce the transformation using that as a driver. Suggest the use of a MetadataDrivenAction. Add the ability to add a correspondence table which would hold the relationships used for recode. The user creates the correspondence table (ex. the IPUMS transformation table). Addresses Joins, recodes, renames

Question: Are you aiming at being able to roundtrip and generate the PENTAHO from the DDI? Yes, we can do that. It's JAVA based. These systems also support formulas and can run scripts. They still want to use their statistical packages to do certain things. Don't want to do statistical analysis in a data management environment. Every time they changed data management they had to rewrite the STATA code because no one could understand another's code. If you capture the algorithm and generate the code as opposed to trying to derive the algorithm from the code. You can see what is going on within the eventual code.

Jay will send to Wendy and Wendy will put in to verify that what Jay is proposing is clear.

LION content:
reviewed and agreed on package/view dispensation DMT-140

Jay to send Kelly a write up of the decisions made in Dagstuhl regarding the logical data description, some of which have not yet made it into the model. Wendy to follow up with Jay about creating an issue for integrating these decisions in the model.

Expand

title	Virtual meeting 2017-12-13

ATTENDEES: Wendy, Larry, Jay

Reviewed the work plan for December 2017 though January 2018 covering the MT review of the Protoptype. Posted on DMT-175

Expand

title	Virtual meeting 2017-11-29

ATTENDEES: Wendy, Larry, Dan, Oliver

DMT-141 - change xs:string to ECVE (entered)
DMT-134 - resolved - in entering changed "broadest" to "highest" to clarify relationship to "lowest"
DMT-66 - resolved (entered)
DMT-144 - generatedBy target Act - target 0..n change after reading content target is 0..1 with a note on how to handle a series of Acts to generate the content of the InstanceVariable
DMT-148 - use base class in very common relation names like "contains"

Expand

title	Virtual meeting 2017-11-22

ATTENDEES: Wendy, Larry, Jay, Oliver

Annotation

Document Information
Dealing with documents in RDF - Ben and Oliver discussion
Every triple store already cares about triples as quatiles as they represent the major box the triples come from. We could use that to identify meaningful usage as archival documents (for instance at GESIS that is put into the long-term preservation for recreation of databases) we would be safe if we carry those specific documents also into those "box identifers" in RDF. We could bring docuent info into the RDF world with the connection to the according triples. We talked about that and figured out that is currently not covered in the LOD research community. Bring this to the conference in Greece (paper deadline in January) - Oliver and Ben plus Jay and and Larry as co-authors plus asking Eric. Interesting and solid use case for that approach.

Implications for document information - none at the moment - could leave as in and recommend that for certain RDF instance bindings

Right now document information is put into all Views - should this be a deliberate selection based on the need for a persistent document rather than an interchange
Would there be cases where you'd want to use the codebook view just for exchange - even a pure interchange view could make use of document interchange properties

FOR PROTOTYPE:
Leave as it is for now
Raise the question of whether this should be available on all views
What would link the DocumentInformation to the rest of the information?

How would you use this information in a sparkle query? What you could do for instance if you already know something you want to query based on a study series. You know about this one document and you want to find the study identification to find variables. For most cases you would just use it for retrieving the provenance information on the metadata.

Annotation usage:

What annotation apply to - the metadata object
Annotation of the document information is about the - review document and Document Information and looking at a specific example. Where does this piece of information about a codebook go. Australian document that Larry is doing is a good example.
DECISION: Larry will check to see if there is any major issue that needs to be addressed ASAP. Otherwise, this could continue on through MT review.

Process Model and Methodology:
Jay's ppt on DDI Methodology as a Data Management plan as a traversal over a GSBPM/GLBPM once or in a series
Extend the workflow process with new properties that support those different types of workflows (series) examples
Jay will update and send out via DDI-SRG list to MT
We want to identify one or two possible extensions of Workflows to support data processing, GSBPM management etc.
Didn't seem to change what Larry was doing because there was the Methodology Overview which allows discursive rather than specific processes
Maybe just the Study workflow, Project management/workflow, more than "data", work plans, Process management

Expand

title	Virtual meeting 2017-11-15

ATTENDEES: Wendy, Jay, Larry, Dan G., Oliver

ACTIONS:

Dan G. - review the issues at the bottom of the list with ? (100, 115, 141) and confirm that they have been addressed by Data Description. Any documentation should be added to these issues over the next month or so in order to make sure it is available -

These have been identified as documentation and Larry will work on providing a clear story about how these work together during the documentation period of prototype work

All - DMT-157 I've reviewed and made comment so I think we can agree and resolve it. If there is no descent then I'm happy to enter that change.

OK resolve

Wendy - will try to get geography structures ready to look at next week

Geography is almost done. It extends CodeList: 2 questions
1) Can a Unit have only a single Unit Type? (why?) - some units are in samples in many surveys expressing different Unit Types. However the Unit Type is so generic that its not as much of an issue. Extensional definition: you list all of the available kinds where you have a list of things that actually defines it. concepts can be roles rather than kinds distinquied by attributes rather than use.

2) Would an abstract base for a CodeList/Statistical Classification/etc. be easier to handle (limits extension depth, opens up for other forms of signifier/signified options)
file and raise again after prototype

DMT-159 - resolved

Annotation issues:
What is a document? When is it a document (in XML? in RDF?)? what is being annotated? what is being cited? Differentiation between say "creator of the content" and "creator of the XML binding of the content". Is it annotation of the metadata or the object described by the metadata?

Access issues: access to data, access to metadata, persistent access restrictions/rules, local restrictions/rules

In RDF you'd have some general information about a triple store but would not be able to distinguish different sources that different triples hand from. If you do quads you can have identification information on the triples. You are not able to say "give me the document root" you'll always land at the level of the triple store not the package of related triples.

https://www.w3.org/2001/12/attributions/

RDF provides not model level division between data and metadata

Jay and Oliver will work on this

Workflows in on the agenda for next week

Expand

title	Virtual meeting 2017-11-08

ATTENDEES: Wendy, Jay, Larry, Dan G., Oliver

spreadsheet used in discussion

Assignments for next week

Dan G. - review the issues at the bottom of the list with ? (100, 115, 141) and confirm that they have been addressed by Data Description. Any documentation should be added to these issues over the next month or so in order to make sure it is available

All - DMT-157 I've reviewed and made comment so I think we can agree and resolve it. If there is no descent then I'm happy to enter that change.

Workflows in on the agenda for next week

Wendy - will try to get geography structures ready to look at next week

All - review items associated with Annotation/Citation, we need to determine what must be addressed for prototype and what if anything can be delayed. Also, what is modeling and what is documentation.

Oliver - add an issue to this Annotation/Citation set that addresses the issue identified in Codebook meeting as well as fuller documentation

Expand

title	Virtual meeting 2017-10-04

ATTENDEES: Wendy, Jay, Larry, Oliver, Dan G.

Codebook will be reviewed for new classes, changes of identifiable to Complex Data Type, etc.

Change name of CodeItem to CodeIndicator (done)

CodeList - will always have contains with CodeIndicator and may have isStructured: ClassificationRelationStructure which points just to category (done)

Instructions should indicate that you must use contains: CodeIndicator for simple and structured CodeLists. If the CodeList is structured use isStructuredBy: ClassificationRelationStructure to provide additional information on complex structure (done)

LogicalDataDescription
issues with LogicalRecord and LogicalRecordLayout. Need to clear up critical content early next week.

Expand

title	Virtual meeting 2017-09-20

ATTENDEES: Wendy, Larry, Jay, Oliver

XMI and definition of default values and regular expressions - Larry will send Oliver some XMI examples

Start entering following Flavio's review pattern and then realize where needed

Add ability to create a variable group and a statistics group (has to relate in some way to a data file)

Codebook is down to finishing up relationship to DCAP and whether to include a relationship to Concept

Expand

title	Virtual meeting 2017-09-13

ATTENDEES: Wendy, Jay, Larry, Dan G., Oliver

Codebook View Review:
Status of DMT issues required for Codebook - reviewed, revised list, assigned
Status of Codebook group work
is it ready for review
What materials will be coming from the group - documentation, examples, etc.

Looked at comments assigned to Codebook during last comment period. All but 2 resolved, plan is to revisit at next meeting in 2 weeks. Can review what is there with recognition of work remaining. Checking on moving meeting up a week to help meet deadline.

Can I enter collection and realization changes for those classes used by Codebook

Expand

title	Virtual meeting 2017-07-26

ATTENDEES: Wendy, Jay, Oliver, Larry, Dan G.

Current state of Collection revision:

Jay spent some time with Dan to walk through it. What came out of this was that we wanted to organize the thing a little more so that the statistical classification fell out of the code list which fell out of the classification set.
The other was do we want to represent the relationships the way Wendy was representing them there or just have views of the relations to represent the relationship and kind of deal with relationships separately.
How well does this reflect GSIM and how much is fine tuning or is it gratuitous remodeling. There seem to be a more straight forward way of doing things. We want to provide something that is transparent and intuitive. We want people to look at this and say "Oh that makes sense".
Dan didn't have time to work on stuff so didn't get into the weeds.
Jay provided a roadmap. How does the this relate to the representations package.
It was pretty clear that we started out we began with realization and a means of simplifying it.
We need to be able to explain relationship to GSIM and the Node/NodeSet was a means of being able to attach each of the basic things and then work on the details.
We have to be able to show a clean map from GSIM to DDI.
Not everything that is in the "pattern" package should be in the pattern. Right now that is muddied by having to put these into the same package. It is "OK" not to have it in the pattern because WE are the ones that are building the realizations not the end user.

Where do we go from here from this

Wendy needs to send out her notes
Jay and Dan need to work this out
We need to agree on some kind of road map
This is fundamental stuff and we need to get it nailed down before Dagstuhl

We need to have a clear workplan and priorities over the next 18 months

As we're working on this can we create mini-examples.

Oliver will create a means for us to make builds.

Expand

title	Virtual meeting 2017-07-12

ATTENDEES: Wendy, Jay, Dan G.

Discussion of collection realizations:

Statistical Classification is the same as a CodeList
Not necessarily in practice
Statistical Classification needs to be mutually exclusive and exhaustive as well as managed
Extension of Statistical Classification from CodeList would support use of Statistical Classification as enumerated value domain
The difference should be between managed and unmanaged, mutually exclusive and exhausted are boolean features.
Add spatial relations for use as specialized classes of base relations
If you inherit from CodeList you can use all of these
The distinction between category set and codelist is having designations (signifier associated with signified)
A signifier is currently a string in a Code but could be an image or sound etc.
Dig deeper into designation down the road if we want to pull things together in this way.

Jay and Dan G. will walk through the realizations next week. Will also have a discussion during TC meeting period this week as there is no TC meeting

Wendy will review what extending Statistical Classification from ColeList would look like. Also explore implication of extending Unit Type, Universe, Population from Concept

Expand

title	Virtual meeting 2017-07-05

ATTENDEES: Wendy, Jay, Oliver, Larry

Went over emails regarding collection model and realization
Talked about Universe, Population, UnitType being a subtype of Concept (using concept sets to describe collections of them)
CategorySet to CodeList to StatisticalClassification

What needs to get done for CodeBook:
Statistics - Variable and category level (DataDescription)
Vocabulary for types of methodologies - we have the generic but need some documentation probably flesh out what Sanda did in a formal document

Documentation of View capture:
Want to do documentation at the view level which is more for users of view but we don't really have the resources at the moment to add class level documentation. Figure out what kind of document is needed and what format we should use now. See what format we have and then put documentation in that format and find anything that doesn't have a home.

Expand

title	Virtual meeting 2017-06-28

ATTENDEES: Wendy, Oliver, Jay, Dan G.

Proposed game plan discussions

Software version of the content is good but there could be other drawback

Documentation of Views:

YAML structures were discussed, putting these examples into Lion
Be able to have instances to prove and use as examples

Patterns:
Update on realizations and XML for Collection Pattern

Classification - maybe start with CodeList and then add levels
Make a view for realizations
Workflow
Logical/Format description
CustomValues

Signification Pattern (new issue DMT-137 describing task).

Please review model in terms of locations where signification pattern should be realized.
We need to have clearer rules in terms of realization
Many classes, such as concept, realize multiple pattern classes and we need to be clear on how these compliment each or conflict (if they do so)
The extent to which we use a pattern - there are instances where the pattern applies but may be of such a character that we aren't realizing (i.e. Identification where we can specify how ID's are formed by relationship to the pattern, but to most seem pretty straight forward). We don't want to twist people into strict knots by forcing realization of a pattern. Difference between the conceptual model and creating the binding and implementing the binding.
The big issue is whether people want to actually model it. At what level is that made invisible - at the point of binding or the point of an instance.
The sweet spot is in doing metadata management. You can start looking at ties between identifiers and how they are formed.
How do we go about exploring this in an efficient way. Lawrence paper should be linked to DMT-137
How its currently modeled is tied to nodes and so if we lose nodes there isn't a representation to hang our hat on.
In walking from the abstract to the concrete it had to do with the modeling. Signification would need to be pulled into at least realizations of part of the collection. May cause the review of the rule that pattern classes can't realize other pattern classes. They extend.

APPROACH:

Re look at the material we have
Review the use of Nodes and effect on signification on that
Create a realization of Statistical Classification or CodeList which use both collection and signification
Evaluate signification within this
Use signification only where it is useful
Better definition of when signification should be used and where it may be superfluous

Expand

title	Virtual meeting 2017-06-14

ATTENDEES: Wendy, Dan G., Jay, Larry

Review of revised collection pattern as found in NewCollectionPattern

There is the idea of a collection and then that a collection is made up of a structure
Base collection (abstract) then specializes to a unordered, strictorder, orderrelation.
There is the collection which can be of 3 types - but we want a means of describing the entirety of the
Too much stuff in the base class - separate the thing from the underlying structure
What we want is a name of the whole collection just at the root
A "proposed collection" that then related to information about the collection and then that contains the structures
Creates a bag of bags as an entry point to the collection

Move from BaseCollection to NewCollection

type 1..1 CollectionType Binary choice of Bag or Set
name 0..n Name A linguistic signifier. Human understandable name (word, phrase, or mnemonic) that reflects the ISO/IEC 11179-5 naming principles. If more than one name is provided provide a context to differentiate usage.
purpose 0..1 InternationalStructuredString Explanation of the intent of this collection. Supports the use of multiple languages and structured text.
usage 0..1 InternationalStructuredString Explanation of the ways in which some decision or object is employed. Supports the use of multiple languages and structured text.

Add to NewCollection

(make a note that this SHOULD be realized as AnnotatedIdentifiable)
isDefinedBy points to base collection which is the root bag
[totality, semantics, hasRelationSpecification]
class by definition is the master collection

Follow-up from Lawrence KS..who's working on what:

Jay..ppt continuing (waiting on realiztions to complete) - what happens to realizations of the process pattern
Jay..beginning to see of the work Chifundo's doing as a means of road testing some of the pattern stuff we're doing
Larry..will work on examples when test realizations are done
Dan..probably has something outstanding and will work on test realizations

Related work:

Qualitative and custom should be cleaned up (Dagstuhl)
Dan is working on LIM

Expand

title	Virtual meeting 2017-06-07

ATTENDEES: Wendy, Jon, Jay, Oliver, Dan, Larry
Issues from Lawrence

DMT-134 resolved - in review
DMT-133 resolved - in review
DMT-132 documentary, Oliver will clarify process and add to documentation of bindings (also need to add to property documentation - use of name "content"

Collection Pattern:
Singletons

A singleton is a bag so we can pick an element; one thing that we were throwing around was the idea of having a shortcut representation of where your domain would be a bag. Our language is such that it is really talking about a single thing to a bag.
If its easier to do this by pointing to a member than to a singleton bag. Worry about the representation when we get there.
If you have something that is pointing to an object and you want it to another object
Proposed collection example: what elements in the range are the target of the relationship
Larry's hierarchy issue of breaking relationships
One of the advantages we have here is that we are talking about a pattern, so in talking about specific realizations we need to realize in a way that prescribes its usage which say forces them to use a parent/child or part/whole relationship
The proof will be when we start looking at realizations
We need to think about what it means to have a relationship to the relationship to the specific items or to the items in the inner relationships - clarify this in documentation
The domain of the inner relationship should be the target of the outer relationship
Hierarchies are created a level at a time (if transitivity applies all the way through you're OK, if not, this could be an issue)

Layout for work this summer (prep for Codebook development review)

Oliver: August all
Dan: August most
Jon: end of July and all of August
Larry: teaching in June in afternoons
Jay: generally available
Wendy: July 19th

ACTION: Wendy will review work and draft summer work plan and send out for comment

Expand

title	Virtual meeting 2017-05-10

ATTENDEES: Wendy, Jon, Jay, Dan G., Oliver, Larry
RESOLVED

DMT-99 - Should identifiable have a derivedFrom property
DMT-78 - Scope of International Identifier should be broadened (additional material added for consideration post meeting)

Discussed and determined to be part of a larger issue on access restriction

DMT-90 - Should there be additional properties at the annotation level

Remaining

DMT-26 - Reg expression serialization in model should support multiple bindings

Expand

title	Virtual meeting 2017-05-03

ATTENDEES: Wendy, Oliver, Jay, Larry
to do before sprint
DMT-99 - Should identifiable have a derivedFrom property
DMT-78 - Scope of International Identifier should be broadened
DMT-90 - Should there be additional properties at the annotation level
DMT-26 - Reg expression serialization in model should support multiple bindings
Actions taken:
DMT-16 - Incomplete list of xs:datatypes in primitive RESOLVED
DMT-66 - Population time space and Unit ON HOLD [need to complete some spatial issues first]
DMT-12 - Review use of URN as opposed to URI RESOLVED [moved to RDFOWL]
Moved to sprint
DMT-105 - Abstracts in models: usage rules
DMT-109 - Create a clear definition of what a view is
DMT-112 - Review ExecutionPair in ComplexPattern
DMT-72 - DocumentationInformation

Expand

title	Virtual meeting 2017-04-26

ATTENDEES: Wendy, Jay, Larry
Variable cascade - Dan is not available today

Collections

Asymmetric Relation is a bit of a puzzle. It doesn't make sense mathematically. Ordered Tuples as represented in graphs doesn't have a source and target. A vector is an operation on a Tuples. Lisp has an operation which returns a beginning and the rest which is how you distinguish the role of members in the Tuples. Other languages don't work like Lisp and have different operations. Like a linked list or doubly linked list.
There may be way accomplish this in a simpler way.
What Dan said: when we model patterns and kind of do more process stuff and less methodology things we want to think about the temporality of things we want to things about things that go back and forth (from a study or a measure, then changing as we learn more and execute it, when we become interested in execution and what happens during it, and then when we refocus on dispersal, use, understanding, replication and platform independent being human readable, platform dependent being actionable/executable.
Distinction between algorithm and process. Went round and round and maybe no distinction. If you describe a coding operation and the algorithm but in doing it you used a black box procedure in a specific software.
The example that comes up in a statistics class in determining error
Realization: If you want to describe the actual execution you're stuck with binary relations because you need to talk about the interfaces between steps. But do we need that everywhere for everything? Can we describe things that are not so entangled in a simpler way such as the flow of a questionnaire. We have to know at what point we need the binary relations.
We'll pull this together now on the issue and ask for Flavio's reaction. Then nail down during Sprint. (use DMT-116) Underlying model of DDI 4.

Sprint prep:

What gets addressed?
What outcomes are required?
What do we need to do in preparation?

to do before sprint

DMT-105
DMT-72
DMT-16
DMT-109
DMT-66
DMT-12
DMT-99
DMT-78
DMT-90
DMT-112
DMT-26
DMT-90
DMT-91

...

Versions Compared

Old Version 152

New Version Current

Key

Virtual MRT meeting 2019-01-30

Agenda (as in invite of 2019-01-28)

Minutes DDI4 MRT Virtual meeting 2019-01-30

1. Description of what is needed for organizing NADDI Sprint (from Achim):

1) Topics for the possible NADDI Sprint (see point 2 in the agenda)

2) Review of list of possible participants and funding

3) Other organizational issues regarding the possible NADDI Sprint

Appendix

DDI4 MRT Virtual meeting 2019-01-23

Agenda (as in invite of 2019-01-23)

Minutes DDI4 MRT Virtual meeting 2019-01-23

Organization and scope:

Discussions regarding the document:

Timeline:

Other:

Agenda (as in meeting notes from of 2019-01-09)

Minutes DDI4 MRT Virtual meeting 2019-01-16

Attendees:

Topics:

Organizational approach:

Appendix

Modeling Team was on hiatus while the Technical Committee prepared the DDI4 Prototype for review