Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
iconfalse

 Data Description View Team

Expand
title9 February 2017 Meeting minutes

Meeting minutes: 9 February 2017

Attendees: Dan Gillman, Jay Greenfield, Larry Hoyle, Steve McEachern

Agenda for this meeting was continuing discussion of the open Q2 issues.

Issues 11 and 12:
Jay had added additional description to the LogicalDataDescription. Issues were agreed as resolved.

Issue 13:

FormatDescription includes now ValueMapping - which gives the additional "level" in the variable cascade (below IV). Dan to write a "Justifying the Variable Cascade" paper walking through the cascade. This is tangentially related to Issue 13, informing the query about relationships from Measure

Related issue in 13 is the issue of identifying Measures and Dimensions (particularly how to establish the relationships being mapped to SDMX). There was a brief discussion of the need to clarify how Viewpoints could be used to describe these in dimensional data.
Suggestion - to develop a paper "Bridging the gap from DDI to SDMX". Dan Gillman to draft for discussion

Question from DG: In terms of Viewpoints - do the combination of Attribute roles provide an identifier for a Dimension? (Larry asked why we would want to do it?)
Cells contain the measure. Attribute combinations identify the location of the cells. So should the attribute combinations be given an IDENTIFIER role or an ATTRIBUTE role?

Some questions were raised about the cardinality of the three ViewpointRoles. May also want to have the ability to have an attribute on each Measure that is in a cell.
(As there may be MULTIPLE measures within a cell).

Jay notes that having complex identifiers is an (increasingly) common situation - for example with big data systems.

Issue is to be held over while the above papers are developed.


Issue 14: ConceptualDomain and ValueDomain

Several questions are raised here. It does appear that some of the questions will be addressed by the "Cascade" paper above. The questions do also seem to be looking at how to then connect the physical layer.

Dan to develop papers for the next meeting.


Issue 16: Documentation of DDI-Views versions of 3.2 ManagedRepresentations

Larry has prepared a SAS data file with the set of managed representations that exist in DDI3.2 (except for a ManagedScaleRepresentation), and a 3.2 instance documenting these. Now need to put together a DDI-Views instance that does the same.

Last discusion: Larry raised whether sentinel values are missing values. Dan argued the reverse - that missing values are ONE TYPE of sentinel value. Need to look at multiple Sentinel Conceptual Domains, and then look at how they could be used in combination.

Issue held over for next meeting.



Expand
titleJanuary 26, 2017 meeting minutes

Meeting minutes 26 Jan 2017

Attendees: Dan Gillman, Larry Hoyle, Jay Greenfield, Steve McEachern


The meeting focussed on discussion of the open issues from the Q2 review.

Comments were added to JIRA and status updated for all items discussed, and the comments are replicated below.


DDI4DATA-9

A value domain could participate in several DataTypes. As such, incorporating IntendedDataType into ValueDomain would be restrictive.
Example - If you add up a set of numbers with a Scale DataType, it will be different from a set of numbers with a floating point (which has greater precision) than

Right now we have IntendedDataType on RepresentedVariable - this seems to the group to be the correct place for the attribute.

Status: resolved


DDI4DATA-10

New property formatPattern to be added to ValueAndConceptDescription. Will be a 0...1 property and will use the UAX35 standard (see link in the Issue Description).
Issue assigned to Larry to complete this work.

Status: In progress


DDI4DATA-11 & 12

The comments here (in Issues 11 and 12) suggest that there is some misunderstanding of the purpose of a Viewpoint. The Viewpoint provides the capacity for the end user to describe the use of a set of variables in a particular context. (This is similar to the Measure/Attribute/Identifier roles that exist in GSIM. The difference is that in GSIM the roles are fixed, but the roles of Variables can change in DDI. The roles are also applied to both dimensional and unit record data in GSIM.

Need to develop some documentation to clarify this meaning. No need for changes to the model as it stands.

Jay will add relevant documentation into Lion to address this.

Status: In progress


DDI4DATA-13

Need to clarify some misreading of the model:
a) There is a relationship between IV to Concept - as IV inherits from RV and CV.
b) DataPoint in DDI contains only one IV - whereas in GSIM it can contain one OR MORE.

Point (b) has implications, particularly if DDI considers the inclusion of complex values such as lists in DataPoints in future. The group agreed to return to this question (and Issue 13) at the next meeting, as well as Issue 14 which was not discussed.


Next meeting: Thursday February 9th 2017, 1400 CET.


...

Expand
title14 January 2016 Minutes - Data Description Meeting

Data Description meeting, 14 January 2016, 2100 CET

Attendees: Barry Radler, Flavio Rizzolo, Dan Smith, Jay Greenfield, Ornulf Risnes, Steve McEachern, Dan Gillman (from 21.40 onwards)

Apologies: Larry Hoyle


There were three outstanding questions from the previous meeting designated for discussion - see previous meeting notes below.


1. Relationships between DataPoint and DataStructure

It was agreed to remove the relationships between DataPoint and DataStructure

  • sppcifiesOrder

  • specifiesIdentifierOrder

And then add relationships from DataStructure to InstanceVariable - the same two relationships above

Questions on this point:

  1. Query from Flavio - link to DataRecord or DataStructure? Dan S. argued for DataStructure, as all DataRecords in a structure are the same - AGREED.
  2. What does DataRecord provide then? Groups together different Measures, Identifiers and Attributes with specific roles. (Note that DataRecord needs a clarification of the definition). Ornulf clarified that the original point of the DataRecord was to group the combination of Datums (each with it’s InstanceVariable) and their Roles into a Collection.

Dan’s argument: DataRecord and DataStructure store data, but Viewpoint stores relationships

Flavio: DataStructure has homogeneous DataRecords only (confirmed by Ornulf)

THUS - need to add to DataStructure definition that it is a homogeneous set of DataRecords.


Agreed that the following needs to be added to the model documentation:

  • A DataStructure can have no DataRecords and therefore no DataPoints - i.e. no records yet collected. It must however have IVs to define what the DataRecord should look like.

  • A DataStructure is a Collection of homogeneous DataRecords

  • A DataRecord must have DataPoints.

  • The DataPoints are then populated with Datums

  • Ordering of IVs would be OPTIONAL (not always appropriate in a Logical structure)


Further questions:

Dan: How do we associated specific Viewpoints with the DataStructure?

Jay: Can a Viewpoint describe, for example, an RDF triple? Dan suggests that this might be possible to do with the use of Roles (e.g. Predicate is defined as an Identifier role for an IV)

Ornulf noted that some of the uses here are documented in the paper from he and Dan authored at the Dagstuhl sprint

https://docs.google.com/document/d/1-vxWdastNsTWMf8qlR35wj1128FNSX-4YBrA_MJBaLk/edit 


Different Viewpoints could be layered on top of the DataRecord. You also don’t necessarily need to use the Viewpoint.


Dan S. noted than that three layers that can be used:

  1. Logical description of a DataStructure
  2. DataRecords and DataPoints
  3. Viewpoints

You will always need to use the DataStructure, but the other two will be optional

DataStructure will therefore have the following relationships:

  • Viewpoints (0 to Many) associated with a DataStructure.
  • DataRecord (0 to Many) associated with a DataStructure.
  • InstanceVariable (specifies Order and specifiesIdentifierOrder)


2. ORDERING:

Agreed that Ordering of DataRecords in DataStructure should be possible but OPTIONAL.

Ordering of InstanceVariables in a DataStructure still needs to be clarified.


3. Usecases

This point wasn't covered directly in the discussion. Agreed that there is a need for testing usecases against the model now, but need to finalise the clean-up of Lion (per Wendy Thomas's review - see minutes below). Agreed therefore that Flavio would update Lion/Drupal, and we would have a special meeting Monday Jan 25 to review this, ahead of the regular meeting on Jan 28. Steve, Jay and Flavio will convene the review meeting, with others welcome if available.


Actions:

  1. Flavio to update the model, and then Flavio/Jay/Steve to meet and confirm. (Special meeting invite for Monday week meeting).
  2. Flavio to circulate model updates to Dan G as well.
  3. Dan G. to review his position on Datum reusability, in light of model updates


Next meeting(s):

a) Review meeting Monday Jan 25th, time TBC.

b) Regular meeting Thursday Jan 28th, 10PM CET, GoToMeeting:

https://global.gotomeeting.com/join/148887013

 

(Note that meeting time will return to CET 10pm for next regular meeting.)

...

Expand
title17 December 2015 meeting minutes

Meeting minutes 17/12/2015

Attendees: Dan Gillman, Jay Greenfield, Larry Hoyle, Steve McEachern, Barry Radler, Ornulf Risnes, Chris Seymour, Dan Smith


Dan Gillman opened with a review of the PPT he provided earlier this week on “Tracking Datums”.

Key points in Dan’s proposal:

  • a DataPoint should exist only if it’s “parent” (a DataStructure) exist.

  • Datum is misnamed (it is actually a group of things)

  • DataPointInstance is the association of a Datum with an InstanceVariable

  • ValueDomain in the model could be either Substantive or Sentinel


Jay: What about the collection of copies of the Datum? What is this thing (if not Datum)?

Larry: How do we identify the particular Datum that is put into the DataPointInstance

Jay: asking does Dan want a class to indicate that all of the Datums represented the same conceptual thing. Dan agreed.

Ornulf: if we have access to the Variable Cascade, can we infer the relevant concepts associated with the Datum?


Ornulf: What does this add that we don’t already have?

  • Dan: didn’t think we have a coherent way of talking about this from the perspective of the DataPoint.

  • Ornulf indicated that he believes we can navigate much of the content in Dan’s model using the existing model

  • Dan: argues that the current model conflates the DataPoint with his new DataPointInstance

  • Dan G and Dan S both argue that the model doesn’t allow us to talk about an empty DataStructure. Dan S notes that DataPoints are NOT reused as currently specified - this apprears to be a point of clarification needed between Dan and Ornulf’s interpretation of the model

  • Ornulf: DataPoint is related to a Record and to an InstanceVariable

  • Dan: as soon as it is associated with an InstanceVariable, a DataPoint has a relationship with a single Datum.


Jay’s interpretation was that the RHS of Dan’s model could improve the model, the LHS is more complicated. Suggests that there are two roads:

  1. Does this improve what we have?

  2. Assuming that we understand that we are storing an individual copy, ... (missing some detail on this point - please add comment here)


Dan: aim of his model is trying to associate a copy of a Datum and an InstanceVariable into a DataPointInstance.

Ornulf: not comfortable with where we are at. He argues that we CAN re-use DataPoints, and that we can track DataPoints (he is currently doing this in RAIRD). Dan asks can Ornulf reuse STRUCTURES. Jay suggests that what Ornulf is doing is actually using DataPointInstance (but naming it DataPoint, as is currently in the model). The question here is fundamentally about reusability.


Larry: Is what is "in" the DataPointInstance a Signifier? And is DataPoint the LOGICAL and DataPointInstance the PHYSICAL?

Dan: key argument is that we have the concept we want to represent (e.g. the NUMERAL five) and a series of strings that signify the concept (e.g. different strings of 5, IV, ...)

  • Conceptual: the NUMBER five

  • Represented: the SIGNIFIER - the NUMERAL five

  • Instance: the actual written down recording

  • (COMMENT FROM STEVE: Colleagues - have I got this right?)

Dan: what isn’t currently covered is the fact that DataPoints can be RE-USED. Ornulf argued that he thinks that’s covered, but Dan's position is that we don’t yet have the “empty bin”.

Dan S./Larry: are we talking about the difference between a logical and a physical, between empty and populated, ...?

(Dan G. left the meeting at this point)


Dan S. suggests that everything that Dan G. is covering is represented in the current version of the model in Lion - in particular, we can address a DataPoint from the InstanceVariable and DataRecord

HOWEVER, Dan S. did have a concern that Ordering in the DataStructure is ordering DataPoints. Dan S. suggests that ordering should be of InstanceVariables. Dan S. argued that DataStructure relationship should be to InstanceVariables rather than DataPoints.

Larry asks whether the relationship should be between the DataRecord and InstanceVariables. Dan notes that if the Record complies with the Structure, then that isn’t necessary.


Questions for discussion at the next meeting:

  1. Dan S.’s solution of realigning the relationships from DataStructure - by removing the to DataPoint and instead making the relationship from DataStructure to InstanceVariable) possibly addresses Dan’s concerns. Dan S. also noted that this would also allow the ViewPoints, Attributes to become OPTIONAL in specifying a logical structure. Comments requested on this.

  2. Ordering concerns need to be taken into account - Ornulf argues that this doesn’t really make sense in a LOGICAL structure. Previous discussion (from Flavio) is that possibly it could be OPTIONAL. Any comments?

  3. Jay: it would be useful to have USECASES to reflect the uses of the required (IV/DS) and optional (VP/DP/DR) parts of the model. Suggested for Jay to look at the openEHR case. Could others volunteer for the simple CSV case? (Steve happy to coordinate of the CSV group - would be nice to align/compare this with the new W3 TabularStructure: http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/ ).


Next meeting:

January 14, 2016. GoToMeeting: https://global.gotomeeting.com/join/148887013 

Proposed time is ONE HOUR EARLIER - 2100 CET. Steve to poll group members about this.

NOTE ALSO NO MEETING DECEMBER 31


...

Expand
titleMPLS Sprint 2015-05-26 Morning Meeting Minutes

HOW FAR DO WE WANT TO GO WITH WHAT WE DESCRIBE?

Jay has put together a deck and has a proposal

He is modifying GSIM model of “data set”

First thing that’s interesting is that the way GSIM represents attributes, it doesn’t give them a possibility of giving them a structure. We’d want to modify it so it could have a structure.

This would be a hook to enter what Larry and Arofan are doing.

[See the ppt]

Discussion took place about what defines 1NF/3NF in the GSIM model and Jay’s propsal. But does it matter or can the terms be changed for description?

The description that Jay proposed makes sense, but terms should be changed to avoid NF’s.

Attributes need to be worked into the GSIM model as they are variables. There are variables in the attribute sets.

LARRY - In DDI do we want to model a datum as a collection of variables or a single variable?

DAN – it’s a single.

LARRY – but then Ornulf describes a datum as a collection of variables

So what are the terms to be used if we’re calling a datum a single variable?E

Datum

Data Structure

  1. “Datum Structure”
    1. identifier(s)
    2. measure(s)
    3. attribute(s)
    4. Logical records
      1. Measure(s)

Coming back to Jay’s stuff this morning.

2 different types, logical record and the basic idea of  key value pair

(reordering above)

  1. Logical records
  2. Key-value pair
  3. Datum structure (which builds a logical record)

Would the key-value pair be possibly triples? Graph data?

Where are we in relation to the work done yesterday? We have a basic structure to then describe a CSV file.

DAN - What could be called a key-value triple which contains a variable (attribute), unit (ID),  value (measure). (There are parallels between this and the datum structure.) So this a the fundamental thing. Let’s use that to define a record, and from that define a CSV.

Record is an ordered set of these key-value triples (“kvipple”) that share the same unit.

Larry making a proposal

We’ve got this record which has 3 collections associated with it: ID, Measures, Attributes. 

Record, ID, Measures, and Attributes are all collections.

Then we want to define a structure of records. That can be instantiated as a dataset

RecordSet is a set of Records (a sub-class of collection)

DataStore store of a RecordSet

STEVE - Can we describe a CSV at this point?

Moving from RecordSet to DataStore we move from logical to physical. We have separated the logical and physical forms

A CSV is one type of DataStore, and all the logical parts are in the RecordSet. Fixed Format is also another of DataStore.

What does a Key-Value Triple option look like? How can this work with aggregated data.

GSIM didn’t try to tackle them all under one structure; are we trying to do it with one?

We can use the basic model of building this up, but we have to interpret it differently and have different relationships associated with it in the case of aggregates.

We need to solve the problem of dimensional data.

Take the combination of the values of each of the dimensions; every combination defines a different cell. Applied to the unit type in the micro data, itself defines an aggregate unit.

Record: Cell

Unit Type (e.g. “people”)

Dimensions (e.g. “age”, “sex”)

Measure (e.g. “income”)

Key: 40 y.o. male plumbers (1. . n components)

            The component could be represented by variables

Each kvipple is a cell. And every cell is a record. The unit incorporates the key.

Are we losing the dimensions?

Does the model work?

The only thing that’s really changing is the idea that the unit is going from one kind of object to an abstract collection object. It’s the set as a completed set, not as the individual element within – is the unit.

The dimension isn’t lost; it’s a combination of aggregated variables.

Unit + dimensions+ variable + value = Key

The unit is shared by the entire cube. It describes the characteristics of the entire population. (working with census data)

For the microdata dimensions are constant (e.g. person). For the macrodata the unit is constant.

Key is M,40. Variable is income. Value is 27,000

Is the unit the cube or the combination of things in the key.

What is the unit?

In a microdota case each cell is a record.

The unit is identified by the key; it’s the interpretation of each cell.

Dimesional data takeaways:

  • There’s something going on here


Units either by groups or individual they mean different things. The unit is dependent on the key.

What's the unit of analysis? The unit of the cube or the unit of the cell? What do we want to do with it?

The unit question - the answer lies in where we attach more information.

We want to put in rules for putting together different slices to put together the RecordSet in the unit.

We need to say what the "thing" is before we put everything together.

Need to look at how datum is described from the point of view of the variables.

The following email and links were provided by Ornulf following the call:

Regarding the question of relations; we've lately come across some interesting thinking in what seems to be an alternative (and more forgiving) way of Data Warehoursing; Data Vault Modeling:

http://en.wikipedia.org/wiki/Data_Vault_Modeling

They have this distinction between Hubs (Units), Satellites (Datums) and Links (relations between Hubs) that looks pretty relevant.

Perhaps some of the participants have heard about this (and discarded it). If not, it's worth a glance at least.

Here's a slideshare that also goes into the newer "hyper agile" data vault solution, where satellites (datums) have a flattened-out structure:
http://www.slideshare.net/kgraziano/agile-data-warehouse-modeling-introduction-to-data-vault-data-modeling

If we strive for 3NF (not sure how I feel about that though) we definitely should take DW modeling into consideration.

Expand
titleMPLS Sprint 2015-05-25 Afternoon Meeting Minutes

In seeking to start creating a simple logical structure, we began by looking at the the 4 objects that had been created during Dagstuhl: DataPoint, DataStructure, DataStore, and DataStoreSummary. Also Dan Gillman began brainstorming a model of DataStructure along with the group.

Review of the DataStructure led to discussion if any parts of it needed to be reviewed and redesigned.

A DataStructure is an ordered set of DataPoints (a record). And a RecordSet is a collection of DataStructures (a table).

The discussion raised the issue of types of records and sequence of records.

Question – do we want to describe a very simple CSV (all DataPoints in a column are the same variable), or a more complex type e.g. a Household, Person structure with record type variables and sequence variables?

If all records do not contain the same sequence of variables then we need to describe record types and sequences.

...

Expand
titleData Description Meeting Minutes 26 March 2015

 DataDescription Meeting Minutes: Thursday March 26th, 2015

Attendees: Jay Greenfield, Dan Gillman, Larry Hoyle, Barry Radler, Ornulf Risnes, Steve McEachern

Jay walked through the thinking of where the current Process model is now at, and what had fed into the work so far. He pointed out that the model (and 3.1 generally) were based on our “traditional” model of questionnaires and datasets, but that now new datatypes are becoming commonplace and possibly dominant. Our recent work has largely been exploring these types.

Known cases we are now asked to support include:

  • Administrative data

  • Qualitative data

  • Experimental data

Jay pointed out that we need to take on board a new notion of lifecycle, or in other words, per Ornulf, there is more than one way to generate a datum. Dan and Jay both pointed out that in this “new world”, we have no clear paths to a datum. This is something that needs to be further fleshed out.

Dan’s comment: The logic for Questionnaire data is clear: question - observation - capture - datum. Other cases are less so. e.g. Derivation: generates data, but requires no question. Here the input is an existing datum.

Ornulf noted that a derivation has various characteristics: it has an input datum, a formula for the derivation, and a datum as an output.

Larry gave an example from a clinical psychologist in which a process is used to collect a combination of questions and observations, but the ultimate “thing” being recorded is actually the scale score as the datum. Barry noted that there are similar sections in MIDUS where the parts are not relevant, but it is the whole that matters.

Barry points out that the step between capture and datum (subsumed now within Observation and Process Step) is “hiding” a number of significant steps - but that we can probably draw on the strength of the process model to document this.

Jay considered a similar case of Computer Adaptive Testing which works from a battery of test questions to ask a set of increasingly difficult or easy questions, and that adapts based on previous responses. Dan points out that there are some similar cases in the survey community, and Barry gave a similar case of conjoint analysis in marketing, as did Jay in EHR.

It may therefore be appropriate to start digging into the process model to see if we can accommodate some of the above use cases using the current combination of Capture, DataDescription and Process.

Jay suggested that we should be exploring these in detail - and that it cannot be rushed. It would be useful therefore to now develop these use cases to test out the current version of the model version, to (a) assess the current objects and process model, and (b) determine what else needs to be included.

Suggested worked use cases:

  • Ornulf’s derivation process for RAIRD event data

  • Larry’s clinical psychology example

  • An administrative data example (Steve??)

  • Other suggestions??

Jay noted his work with Splunk here, where they are always aggregating and disaggregating from the datum level.Dan noted worries here about confidentiality in such a process. Jay also recognised this, but pointed out the access rights associated with each datum as one means to resolve this. Ornulf also had been addressing this solution in the RAIRD work, using statistical disclosure control on the end products.

Moving forward, it was agreed to take away these use cases, and start describing using the Capture/DataDescription/Process views. Example cases are given above, but it would be good to get additional cases of interest to the members of the group - particularly where group members are collaborating on cases. This work will require some extensive thinking, so the agreement was made to continue to work on these use cases, but to switch focus for our fortnightly meeting to the Physical Data Description.

Next meeting: Thursday 9 April. Time to be confirmed (due to Daylight savings changes in Europe and Aust/NZ)

Agenda will be to review and evaluate the current status of Physical Data Description. This will need to focus on:

  • The file description

  • The logical structure.

In preparation, it would be useful if team members could review the three pieces of work so far in this area:

Expand
titleMeeting minutes 11 March 2015

Data Description Meeting 11/3/2015

Attendees: Steve McEachern (ADA, Australian National University), Larry Hoyle (IPSR, University of Kansas), Dan Gillman (BLS), Barry Radler (MIDUS, University of Wisconsin), Simon Lloyd (ABS), Ornulf Risnes (NSD)

We updated the progress since the last meeting, particularly the document Steve and Barry generated out of the "Linking..." presentation developed by Dan and Jay. This integrated model, bringing together the interface between Capture and DataDescription, is available here as a PDF, with the objects and relationships specified in the document available in the http://lion.ddialliance.org Drupal site.

  • Dan gave some initial comments on the model: What about those datums that are produced out of an observation that is not from a capture, e.g. a datum from a derived variable

  • Barry and Larry made the point that any observation is an outcome of a process - but that may not be generated by an instrument (e.g. generation of survey weights)

  • Complicating processes include: editing, computation, derivation, weighting

  • The term “observation” also alludes to originating from a physical source - where the above are not originating in the physical, while the machine generated processes

  • DDI 3.2 has a Generation as an output of producing a Datum from another machine data source - this might be a good existing option to draw upon

  • Capture and Generation would be sub-classes of a higher level class

  • Ornulf makes the point that this “first capture” versus later “derivations” may over-complicate the model - and may also create an artificial distinction

  • It may be the case that this distinction may be better defined within the Process group (as a “Processing Cascade”??)

  • The distinction between observation and generation would then arise when you determine where this arises in the processing cascade.

  • The class could also be a base class in the Conceptual view, an “UberDatum”

The general conclusion from the discussion is that the relationship between ProcessStep, Observation and Datum looks sound, but that the ProcessStep and Observation objects may need additional work in order to see if they are sub-classes of a broader type.

Thus the next meeting will explore further the requirements both Capture and DataDescription have for the Process model. In the interim, additional email discussion will continue around comments on the Capture-DataDescription link, building on Jay’s discussion of similar issues in HL7 and OpenEHR.

The provisional time for the meeting will be Thursday March 26 at 8.00PM Central European time. The GoToMeeting URL is:

https://global.gotomeeting.com/join/148887013

However given Jay’s existing work and his role with the Process model, which are the next step in our discussion, we will coordinate times around Jay’s availability if required.

...

Expand
titleMarch 17 meeting

2014-03-17 Meeting Minutes

Time:

15:00 CET


 

Meeting URL:

https://www3.gotomeeting.com/join/685990342 


 

Agenda:

1) Status update. Where are we now with SimpleDataDescription? (ØR)

 

2) Clarify relationship between domain experts and modeler. Define role responsibilities, desired workflow in group (ØR, AW?)

 

Domain expert adds object descriptions and relationships

Modeler puts them into the overall model

Then iteration


What is the status of round trip?

Drupal to xmi to EA? Yes.

Is there  machine actionable feedback into Drupal? No. It is possible but some work is required. It is not clear yet if there are resources for this task. Furthermore there are different positions on the issue if the roundtrip makes sense.

 

3) Identified issues with the current version (ØR/all)

a) Model is sparse on properties for InstanceVariable, RepresentedVariable, ConceptualVariable. Out of scope for this group?

Comments: These objects currently only exists in the SimpleDataDescription package. Discussion about GSIM/DDI 3.2 and who’s responsible for the “core variable objects”. 

b) Do we need DataSerialisation (the physical counterpart of DataDescription)? DataDescription already relates to InstanceVariable, which relates to Field (column) in the RectangularDataFile. Because of this, a path exists from the Fields in the RectangularDataFile via InstanceVariable up to DataDescription and “TOFKAS”

c) DataSerialisation has no relationship to RectangularDataFile. If we decide to keep DataSerialisation, surely the relationshop to RectangularDataFile must be added.

 

4) TODO; Identify outstanding tasks (ØR/all)

  • Dan shares info on data.gov-Data dictionary

  • Dan shares a set of example data descriptions

  • Ørnulf pulls info from GSIM to produce candidate objects/properties for InstanceVariable, RepresentedVariables, ConceptualVariables

  • Larry shares findings/glossary for terms in extended attributes for SAS Enterprise Guide tool (below)

  • Ørnulf to suggest some “benchmark datasets” that can be used to document our work, and to “prove” that we are able to model a set of different data sets with our new model

  • Barry to flag potential issues from fieldwork with 3.2

    • Still a couple of months down the road

  • Ørnulf to harmonize minutes document and bring Larry’s notes in the right place

  • Ørnulf to try to arrange a meeting in April

  • Larry remembers to invite Ørnulf in case he’s needed for a virtual meeting during the NADDI sprint.


 5) Assign responsibilities for outstanding tasks (ØR/all)

See above.

 

6) Plan milestones (based upon TODO-list, goals and availability) (ØR/all)

Overall milestone plan/timelines to be clarified during NADDI sprint. Thérèse Lalor (ABS) is currently the project manager for DDI4 - but only until July 2014.


 

Other notes:


...