Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Warning
iconfalse

Materials from External Review work at Dagstuhl Sprint


Expand
titleDDI4 Virtual MRT Virtual meeting 2019-01-30
DDI4

Virtual MRT

Virtual

meeting 2019-01-30

Agenda (as in invite of 2019-01-28)

Goal: To have a draft program of work for NADDI Sprint (to submit to EB), and an initial list of proposed tasks.

  1. Description of what is needed for organizing NADDI Sprint (from Achim, with possible draft planning document?)
  2. Discussion of work topics/tasks sufficient for initial organization/NADDI Sprint

Suggestions:

    - Documentation of datum-based model application to examples of event data, aggregates, etc.

  • Agreed list of data types to be worked on

  - Existing modeling technical requirements/issues

  • Simplification of the model (i.e. less inheritance and less specialized classes)
  • Review of collections (use of appropriate UML properties, use of collections throughout the model)
  • Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
  • Review of views (definition and effective use of subsets of the model)

  - Others

  1. Plan for addressing infrastructure tasks (modeling tools, production framework & process, testing groups/liaison, etc.) to support immediate and longer-term tasks

    - Are there ideas/candidate tools which need to be written up/further explored?

  • Current status of production platform post-Berlin

    - Identify tester and potential testers

(We may not get this far, but if we have time it would be good)

Minutes DDI4 MRT Virtual meeting 2019-01-30

Attendees:  Achim, Arofan, Dan G., Flavio, Hilde, Jay, Larry, Oliver, Wendy

Apologies: Jon

1.      Description of what is needed for organizing NADDI Sprint (from Achim):

A goal for the meeting is to have an agreed document regarding the NADDI Sprint planning ready to send to the AG to inform their discussions at their next meeting, and further to apply for funding of the possible sprint.

Achim prepared and sent out the document ‘NADDISprintPlanning.docx’ to the srg list in advance of the meeting. This is a shell document where some content of the document needed to be filled in or reviewed and agreed while the meeting.

The meeting was structured in three parts: 1) Topics for the possible NADDI Sprint; 2) Review of possible participants and funding; 3) Other organizational issues regarding the possible NADDI Sprint.

1) Topics for the possible NADDI Sprint (see point 2 in the agenda)

a) Documentation of datum-based model application to examples of data structures (to be discussed and agreed at the meeting which data structures to focus on at the Sprint).

b) Discussion and possible resolution of structural model issues:

  • Simplification of the model (i.e. less inheritance and less specialized classes)
  • Review of collections (use of appropriate UML properties, use of collections throughout the model)
  • Review of design patterns (relationship to acknowledged software design patterns, relevance of design patterns for users of the model and of the representations)
  • Review of views (definition and effective use of subsets of the model)

Status: Points a) and b) agreed as topics for the possible NADDI Sprint.

Discussion and agreements regarding example data structures a)

Example data structures (point a) were discussed after the structural model issues (point b).

The discussion regarding which data structures to focus on as examples at the possible NADDI Sprint was centered on whether to focus on common vs. complex cases and corner cases.

Dan G. pointed out the importance of modelling complex cases, as more common or simple cases would then be solved at the same time. Others pointed out that issues could occur even if a similar approach is used. Agreement was reached to focus on the common cases as a preparation for the possible NADDI Sprint.

Status:  Agreement was reached to focus on the following common data structures for the possible NADDI Sprint:

  • Rectangular data
  • Event data (wide and narrow data)
  • Single datum points
  • Multidimensional data like data cubes and aggregate data

Arofan and Wendy pointed out that the Variable Cascade documentation (provided for example in the Variable Cascade presentation from the Dagstuhl workshop DDI Train-the-Trainers 2018)  indicates the style and level of information needed for documentation.

        Status: Wendy will add this as a prototype review comment.

After NADDI further data structures may possibly be explored, for example NoSQL (non SQL) data like Hadoop data, graphs etc.

         Status: Agreed

In the Appendix an example from the discussion provided by Larry is found.

Discussion and agreements regarding structural model issues b)

Structural model issues (point b) were discussed before the example data structures (point a).

Conceptual resolution/MRT: Jay brought up the issue if structural issues could be resolved conceptually or by using the MRT approach. Flavio pointed out the need to look at many different examples to check out structural modelling issues. Achim indicated this could be a topic for the possible face to face meeting and something for a work group to focus on in advance.

Complexity of the model: Flavio commented that the model is complex because it is made complex. It has multiple levels and covers both common and domain specific needs. To simplify the understanding, some of the content could for example be hidden for specific user groups.

Achim asks if the model can be improved by focusing on questions like:

  • What is really the core?
  • What are the fine-grained details?
  • What are domain specific things?

Work regarding the complexity of the model could be done in advance and brought to the sprint.

Review of views: The revision of Views is important. Achim points out that even a simple view like the Agency view drives in a lot of classes.

Flavio points out that Views are complex because they currently are designed to cover multiple dimensions. The Classification View is for example meant to cover reuse, classification management and publishing. This and other views would need separation into smaller sets to be easier to understand.

Larry expresses that the model currently is highly connected but that good documentation can help the understanding.

Status: Agreement to focus on the four bullet points under b) above for the planned Sprint. Tasks should be broken up as much as possible. Smaller groups could work on each of those and get back with a proposal for the full group after a week or two. A specific person should be responsible to follow up on the work on each task.

 2) Review of list of possible participants and funding

The following agreement was made:

 The following people would be available in person for this meeting (their need for funding in parenthesis):

  • Achim Wackerow (travel, accommodation, food)
  • Arofan Gregory (travel, accommodation, food)
  • Dan Gillman (accommodation, food)
  • Flavio Rizzolo (lives in Ottawa)
  • Hilde Orten (to be clarified)
  • Jay Greenfield (accommodation, food)
  • Jon Johnson (accommodation, food)
  • Larry Hoyle (accommodation, food)
  • Wendy Thomas (accommodation, food)

Most of the people would need funding from the DDI Alliance as specified in the NADDISprintPlanning_1_0.docx document.

Oliver Hopt would be available by phone.

3) Other organizational issues regarding the possible NADDI Sprint

Possibilities for meeting location and lodging have been checked out and booked by Flavio and Achim as follows:

  • Two meeting rooms at StatCan for Tuesday and Wednesday
  • StatCan is closed on the Monday due to Easter. A hotel can be used for the Monday meeting for additional costs and a room is booked.
  • 12 rooms are reserved at the hotel. The price is a bit higher on Sunday and Monday then on Tuesday and Wednesday, due to Easter.

Two documents are sent to the AG for their feedback prior to their next meeting (also sent to the srg list):

  • The MRT DDI4 Core proposal document (MRT_DDI4Core_1_0.docx) - sent by Achim on Monday 28th.
  • An agreed, updated version of the NADDI Sprint Planning document (NADDISprintPlanning_1_0.docx) – sent by Achim after the meeting on Wednesday 30th.

Further follow-up is required regarding organizing the start-up of the work, and making plans for what needs to be prepared in advance of the possible NADDI Sprint.

Appendix

Example from Larry related to discussions of point a):

With the ability to describe data at the datum level DDI should be able to describe data like that in the following example through transformations from traditional rectangular (wide) layouts into key-value (tall) representations.

 DDI4 can currently describe the data in the wide layout, but, though we have discussed how to do the tall representation, that work has not been completed in the model.

 Wide data table:


Corresponding tall representation:

Transformations between these layouts are common in data software packages. The SAS code below shows the transformation from the wide to the tall.

Note that in the Tall representation the column Source is a pointer to a variable in the wide layout. The column Value1 is not a traditional variable, in that there is no one value domain or concept associated with the whole column, instead those things depend on the pointer in Source.

If we can properly describe datum level metadata we should be able to describe the value domain and concept associated with the “yes” category label (which is actually a code of 1 in the SAS dataset) in the Value1 column. We should also be able to describe the meaning and units of measurement of the value 185 in the same column.

 Proc format;

 value yn

   1="yes"

   2="no"

   ;

/* example rectangular file */

data fooWide;

input Name $ Height Answer;

label Name="Person name"

      Height="height in cm"

       Answer="Answer to 'Are you hapy?";

format Answer yn.;

datalines;

Joe 185 1

Mary 160 2

;

run;

proc sort data=work.fooRect;

by Name;

PROC TRANSPOSE DATA=fooWide

     OUT=WORK.fooTall(LABEL="Transposed WORK.FOORECT")

     PREFIX=Value

     NAME=Source

     LABEL=Label

 ;

     BY Name;

     VAR Height Answer;

     format Value1 yn






...