...

Info

icon	false

Data Description View Team

Expand

title	March 30 2017 meeting minutes

Meeting minutes, March 30, 2017

Attendees: Larry Hoyle, Jay Greenfield and Steve McEachern

Larry raised the question of the complexity of code lists

Nodelist -> CodeList; CodeItem -> Designation, Designation → ... etc.

There's a large number of classes to be completed for a variable. Wondered for example should a CodeItem just have a PROPERTY of a Code?
Jay suggested that a lot of this came from GSIM. (Often relates to the management of Code Lists / Statistical Classifications). It isn't clear from the GSIM approach how codes are associated with categories.

Larry demonstrated the extent of description that is required to describe a single variable. Jay noted that the reusability is beneficial in various situations. Steve also noted that some of this complexity is also present in earlier DDI versions. The lack of reusability is also a problem for DDI-C (the same code lists then are repeated over and over).

Sense from the group was that we should put the capacity for "trimming" the instance on the table and see how this works (probably in the tools, not in DDI-Views). Might be a recommendation for tools as to how to approach this.

Next meeting: Thursday April 20, 1500 CEST

Comparable times:

Mannheim, Germany Thu, 20 Apr 2017 at 3:00 pm CEST
Bergen, Norway Thu, 20 Apr 2017 at 3:00 pm CEST
Canberra, Australia Thu, 20 Apr 2017 at 11:00 pm AEST
Washington DC, USA Thu, 20 Apr 2017 at 9:00 am EDT
Ottawa, Canada Thu, 20 Apr 2017 at 9:00 am EDT
Lawrence, USA Thu, 20 Apr 2017 at 8:00 am CDT
Madison, USA Thu, 20 Apr 2017 at 8:00 am CDT

Expand

title	March 09 2017 meeting minutes

Data Description meeting minutes - 09 March 2017

Attendees: Steve McEachern, Larry Hoyle, Dan Gillman

Upcoming calendar:
- Proposal is to stick with CET 1400 for upcoming meetings
- March 23: US participants will be one hour later (9am DC, 8am Kansas)
- April 6: No meeting
- April 20: US participants back to current time (8am DC, 7am Kansas), Australia one hour earlier (10pm Canberra)

Today:

Updates on action items from last meeting
- Dan and Steve still working on papers
- Larry has been developing the DDI4 markup in a YAML template

Larry walked the group through the template

Noted in reviewing:

1. Proposal for the modelling group:
- Higher level concern: what are the interoperability requirements for DDI? What do we need to interoperate on?
- Example was the controlled vocabulary in PhysicalDataType. The CV comes with a significant number of its own additional properties - cVAgencyName, cVID, etc...
- Need to take out a lot of these - they are no longer necessary with URIs.
- Larry and Dan suggested that all of this information could be removed - instead we could get by with just CONTENT and URI - in fact, having the additional properties may lead to interoperability issues. (It could even be a DDI URN).
- i.e. Let's make this a USEABLE standard, not an overwhelming one
Note: we need to review the DataDescription model to find all the instances of this that WE wish to remove.

2. More on intendedDataType and Additivity:

Dan suggested we may want to put Additivity alongside intendedDataType.
This is more of a general problem as well - how do we describe the "intended" or "appropriate" characteristic of the variable. Properties include:
"Type" of measurement (e.g. nominal -> ratio)
Unit of measurement
IntendedDataType

In the end we need a more detailed characterisation of the variable - to allow an algorithm to determine what are the permissible/appropriate machine actionable operations on a particular variable. Need to discuss this further.

3. Scale

No place for this (which came out of 3.2).
Again, suggest that this is part of the UnitOfMeasurement?
(e.g. Dozen is 12, Dozen x Dozen = Gross, and Dozen x Dozen x Dozen = Great Gross)

4. InstanceQuestion

Doesn't currently have a relationship to InstanceVariable?
Dan noted that there may not necessarily need to be
InstanceQuestion has no direct question text - it is in REPRESENTEDQuestion
Should be able to inherit this.
Should also be able to point the relationship between IV and IQuestion as well.

5. InstanceVariable

Has "measures" from two different things:
- inherits (from RepresentedVariable): "measures" Universe
- relationship: "measures" Population
Larry suggested to qualify the name, e.g. the relationship should be "measuresPopulation" as the name of the relationship
May be a more general problem in the model where we have inheritance of relationships.

Next meeting: March 23, 1400 CET

...

Expand

title	14 January 2016 Minutes - Data Description Meeting

Data Description meeting, 14 January 2016, 2100 CET

Attendees: Barry Radler, Flavio Rizzolo, Dan Smith, Jay Greenfield, Ornulf Risnes, Steve McEachern, Dan Gillman (from 21.40 onwards)

Apologies: Larry Hoyle

There were three outstanding questions from the previous meeting designated for discussion - see previous meeting notes below.

1. Relationships between DataPoint and DataStructure

It was agreed to remove the relationships between DataPoint and DataStructure

sppcifiesOrder
specifiesIdentifierOrder

And then add relationships from DataStructure to InstanceVariable - the same two relationships above

Questions on this point:

Query from Flavio - link to DataRecord or DataStructure? Dan S. argued for DataStructure, as all DataRecords in a structure are the same - AGREED.
What does DataRecord provide then? Groups together different Measures, Identifiers and Attributes with specific roles. (Note that DataRecord needs a clarification of the definition). Ornulf clarified that the original point of the DataRecord was to group the combination of Datums (each with it’s InstanceVariable) and their Roles into a Collection.

Dan’s argument: DataRecord and DataStructure store data, but Viewpoint stores relationships

Flavio: DataStructure has homogeneous DataRecords only (confirmed by Ornulf)

THUS - need to add to DataStructure definition that it is a homogeneous set of DataRecords.

Agreed that the following needs to be added to the model documentation:

A DataStructure can have no DataRecords and therefore no DataPoints - i.e. no records yet collected. It must however have IVs to define what the DataRecord should look like.
A DataStructure is a Collection of homogeneous DataRecords
A DataRecord must have DataPoints.
The DataPoints are then populated with Datums
Ordering of IVs would be OPTIONAL (not always appropriate in a Logical structure)

Further questions:

Dan: How do we associated specific Viewpoints with the DataStructure?

Jay: Can a Viewpoint describe, for example, an RDF triple? Dan suggests that this might be possible to do with the use of Roles (e.g. Predicate is defined as an Identifier role for an IV)

Ornulf noted that some of the uses here are documented in the paper from he and Dan authored at the Dagstuhl sprint

https://docs.google.com/document/d/1-vxWdastNsTWMf8qlR35wj1128FNSX-4YBrA_MJBaLk/edit

Different Viewpoints could be layered on top of the DataRecord. You also don’t necessarily need to use the Viewpoint.

Dan S. noted than that three layers that can be used:

Logical description of a DataStructure
DataRecords and DataPoints
Viewpoints

You will always need to use the DataStructure, but the other two will be optional

DataStructure will therefore have the following relationships:

Viewpoints (0 to Many) associated with a DataStructure.
DataRecord (0 to Many) associated with a DataStructure.
InstanceVariable (specifies Order and specifiesIdentifierOrder)

2. ORDERING:

Agreed that Ordering of DataRecords in DataStructure should be possible but OPTIONAL.

Ordering of InstanceVariables in a DataStructure still needs to be clarified.

3. Usecases

This point wasn't covered directly in the discussion. Agreed that there is a need for testing usecases against the model now, but need to finalise the clean-up of Lion (per Wendy Thomas's review - see minutes below). Agreed therefore that Flavio would update Lion/Drupal, and we would have a special meeting Monday Jan 25 to review this, ahead of the regular meeting on Jan 28. Steve, Jay and Flavio will convene the review meeting, with others welcome if available.

Actions:

Flavio to update the model, and then Flavio/Jay/Steve to meet and confirm. (Special meeting invite for Monday week meeting).
Flavio to circulate model updates to Dan G as well.
Dan G. to review his position on Datum reusability, in light of model updates

Next meeting(s):

a) Review meeting Monday Jan 25th, time TBC.

b) Regular meeting Thursday Jan 28th, 10PM CET, GoToMeeting:

https://global.gotomeeti ng.com/join/148887013

(Note that meeting time will return to CET 10pm for next regular meeting.)

...

Expand

title	17 December 2015 meeting minutes

Meeting minutes 17/12/2015

Attendees: Dan Gillman, Jay Greenfield, Larry Hoyle, Steve McEachern, Barry Radler, Ornulf Risnes, Chris Seymour, Dan Smith

Dan Gillman opened with a review of the PPT he provided earlier this week on “Tracking Datums”.

Key points in Dan’s proposal:

a DataPoint should exist only if it’s “parent” (a DataStructure) exist.
Datum is misnamed (it is actually a group of things)
DataPointInstance is the association of a Datum with an InstanceVariable
ValueDomain in the model could be either Substantive or Sentinel

Jay: What about the collection of copies of the Datum? What is this thing (if not Datum)?

Larry: How do we identify the particular Datum that is put into the DataPointInstance

Jay: asking does Dan want a class to indicate that all of the Datums represented the same conceptual thing. Dan agreed.

Ornulf: if we have access to the Variable Cascade, can we infer the relevant concepts associated with the Datum?

Ornulf: What does this add that we don’t already have?

Dan: didn’t think we have a coherent way of talking about this from the perspective of the DataPoint.
Ornulf indicated that he believes we can navigate much of the content in Dan’s model using the existing model
Dan: argues that the current model conflates the DataPoint with his new DataPointInstance
Dan G and Dan S both argue that the model doesn’t allow us to talk about an empty DataStructure. Dan S notes that DataPoints are NOT reused as currently specified - this apprears to be a point of clarification needed between Dan and Ornulf’s interpretation of the model
Ornulf: DataPoint is related to a Record and to an InstanceVariable
Dan: as soon as it is associated with an InstanceVariable, a DataPoint has a relationship with a single Datum.

Jay’s interpretation was that the RHS of Dan’s model could improve the model, the LHS is more complicated. Suggests that there are two roads:

Does this improve what we have?
Assuming that we understand that we are storing an individual copy, ... (missing some detail on this point - please add comment here)

Dan: aim of his model is trying to associate a copy of a Datum and an InstanceVariable into a DataPointInstance.

Ornulf: not comfortable with where we are at. He argues that we CAN re-use DataPoints, and that we can track DataPoints (he is currently doing this in RAIRD). Dan asks can Ornulf reuse STRUCTURES. Jay suggests that what Ornulf is doing is actually using DataPointInstance (but naming it DataPoint, as is currently in the model). The question here is fundamentally about reusability.

Larry: Is what is "in" the DataPointInstance a Signifier? And is DataPoint the LOGICAL and DataPointInstance the PHYSICAL?

Dan: key argument is that we have the concept we want to represent (e.g. the NUMERAL five) and a series of strings that signify the concept (e.g. different strings of 5, IV, ...)

Conceptual: the NUMBER five
Represented: the SIGNIFIER - the NUMERAL five
Instance: the actual written down recording
(COMMENT FROM STEVE: Colleagues - have I got this right?)

Dan: what isn’t currently covered is the fact that DataPoints can be RE-USED. Ornulf argued that he thinks that’s covered, but Dan's position is that we don’t yet have the “empty bin”.

Dan S./Larry: are we talking about the difference between a logical and a physical, between empty and populated, ...?

(Dan G. left the meeting at this point)

Dan S. suggests that everything that Dan G. is covering is represented in the current version of the model in Lion - in particular, we can address a DataPoint from the InstanceVariable and DataRecord

HOWEVER, Dan S. did have a concern that Ordering in the DataStructure is ordering DataPoints. Dan S. suggests that ordering should be of InstanceVariables. Dan S. argued that DataStructure relationship should be to InstanceVariables rather than DataPoints.

Larry asks whether the relationship should be between the DataRecord and InstanceVariables. Dan notes that if the Record complies with the Structure, then that isn’t necessary.

Questions for discussion at the next meeting:

Dan S.’s solution of realigning the relationships from DataStructure - by removing the to DataPoint and instead making the relationship from DataStructure to InstanceVariable) possibly addresses Dan’s concerns. Dan S. also noted that this would also allow the ViewPoints, Attributes to become OPTIONAL in specifying a logical structure. Comments requested on this.
Ordering concerns need to be taken into account - Ornulf argues that this doesn’t really make sense in a LOGICAL structure. Previous discussion (from Flavio) is that possibly it could be OPTIONAL. Any comments?
Jay: it would be useful to have USECASES to reflect the uses of the required (IV/DS) and optional (VP/DP/DR) parts of the model. Suggested for Jay to look at the openEHR case. Could others volunteer for the simple CSV case? (Steve happy to coordinate of the CSV group - would be nice to align/compare this with the new W3 TabularStructure: http://www.w3.org/TR/2015/WD-tabular-data-model-20150416/ ).

Next meeting:

January 14, 2016. GoToMeeting: https://global.gotomeeting.com/join/148887013

Proposed time is ONE HOUR EARLIER - 2100 CET. Steve to poll group members about this.

NOTE ALSO NO MEETING DECEMBER 31

...

Expand

title	MPLS Sprint 2015-05-26 Morning Meeting Minutes

HOW FAR DO WE WANT TO GO WITH WHAT WE DESCRIBE?

Jay has put together a deck and has a proposal

He is modifying GSIM model of “data set”

First thing that’s interesting is that the way GSIM represents attributes, it doesn’t give them a possibility of giving them a structure. We’d want to modify it so it could have a structure.

This would be a hook to enter what Larry and Arofan are doing.

[See the ppt]

Discussion took place about what defines 1NF/3NF in the GSIM model and Jay’s propsal. But does it matter or can the terms be changed for description?

The description that Jay proposed makes sense, but terms should be changed to avoid NF’s.

Attributes need to be worked into the GSIM model as they are variables. There are variables in the attribute sets.

LARRY - In DDI do we want to model a datum as a collection of variables or a single variable?

DAN – it’s a single.

LARRY – but then Ornulf describes a datum as a collection of variables

So what are the terms to be used if we’re calling a datum a single variable?E

Datum

Data Structure

“Datum Structure”
1. identifier(s)
2. measure(s)
3. attribute(s)
4. Logical records
  1. Measure(s)

Coming back to Jay’s stuff this morning.

2 different types, logical record and the basic idea of key value pair

(reordering above)

Logical records
Key-value pair
Datum structure (which builds a logical record)

Would the key-value pair be possibly triples? Graph data?

Where are we in relation to the work done yesterday? We have a basic structure to then describe a CSV file.

DAN - What could be called a key-value triple which contains a variable (attribute), unit (ID), value (measure). (There are parallels between this and the datum structure.) So this a the fundamental thing. Let’s use that to define a record, and from that define a CSV.

Record is an ordered set of these key-value triples (“kvipple”) that share the same unit.

Larry making a proposal

We’ve got this record which has 3 collections associated with it: ID, Measures, Attributes.

Record, ID, Measures, and Attributes are all collections.

Then we want to define a structure of records. That can be instantiated as a dataset

RecordSet is a set of Records (a sub-class of collection)

DataStore store of a RecordSet

STEVE - Can we describe a CSV at this point?

Moving from RecordSet to DataStore we move from logical to physical. We have separated the logical and physical forms

A CSV is one type of DataStore, and all the logical parts are in the RecordSet. Fixed Format is also another of DataStore.

What does a Key-Value Triple option look like? How can this work with aggregated data.

GSIM didn’t try to tackle them all under one structure; are we trying to do it with one?

We can use the basic model of building this up, but we have to interpret it differently and have different relationships associated with it in the case of aggregates.

We need to solve the problem of dimensional data.

Take the combination of the values of each of the dimensions; every combination defines a different cell. Applied to the unit type in the micro data, itself defines an aggregate unit.

Record: Cell

Unit Type (e.g. “people”)

Dimensions (e.g. “age”, “sex”)

Measure (e.g. “income”)

Key: 40 y.o. male plumbers (1. . n components)

The component could be represented by variables

Each kvipple is a cell. And every cell is a record. The unit incorporates the key.

Are we losing the dimensions?

Does the model work?

The only thing that’s really changing is the idea that the unit is going from one kind of object to an abstract collection object. It’s the set as a completed set, not as the individual element within – is the unit.

The dimension isn’t lost; it’s a combination of aggregated variables.

Unit + dimensions+ variable + value = Key

The unit is shared by the entire cube. It describes the characteristics of the entire population. (working with census data)

For the microdata dimensions are constant (e.g. person). For the macrodata the unit is constant.

Key is M,40. Variable is income. Value is 27,000

Is the unit the cube or the combination of things in the key.

What is the unit?

In a microdota case each cell is a record.

The unit is identified by the key; it’s the interpretation of each cell.

Dimesional data takeaways:

There’s something going on here

Units either by groups or individual they mean different things. The unit is dependent on the key.

What's the unit of analysis? The unit of the cube or the unit of the cell? What do we want to do with it?

The unit question - the answer lies in where we attach more information.

We want to put in rules for putting together different slices to put together the RecordSet in the unit.

We need to say what the "thing" is before we put everything together.

Need to look at how datum is described from the point of view of the variables.

The following email and links were provided by Ornulf following the call:

Regarding the question of relations; we've lately come across some interesting thinking in what seems to be an alternative (and more forgiving) way of Data Warehoursing; Data Vault Modeling:

http://en.wikipedia.org/wiki/Data_Vault_Modeling

They have this distinction between Hubs (Units), Satellites (Datums) and Links (relations between Hubs) that looks pretty relevant.

Perhaps some of the participants have heard about this (and discarded it). If not, it's worth a glance at least.

Here's a slideshare that also goes into the newer "hyper agile" data vault solution, where satellites (datums) have a flattened-out structure:
http://www.slideshare.net/kgraziano/agile-data-warehouse-modeling-introduction-to-data-vault-data-modeling

If we strive for 3NF (not sure how I feel about that though) we definitely should take DW modeling into consideration.

Expand

title	MPLS Sprint 2015-05-25 Afternoon Meeting Minutes

In seeking to start creating a simple logical structure, we began by looking at the the 4 objects that had been created during Dagstuhl: DataPoint, DataStructure, DataStore, and DataStoreSummary. Also Dan Gillman began brainstorming a model of DataStructure along with the group.

Review of the DataStructure led to discussion if any parts of it needed to be reviewed and redesigned.

A DataStructure is an ordered set of DataPoints (a record). And a RecordSet is a collection of DataStructures (a table).

The discussion raised the issue of types of records and sequence of records.

Question – do we want to describe a very simple CSV (all DataPoints in a column are the same variable), or a more complex type e.g. a Household, Person structure with record type variables and sequence variables?

If all records do not contain the same sequence of variables then we need to describe record types and sequences.

...

Expand

title	Data Description Meeting Minutes 26 March 2015

DataDescription Meeting Minutes: Thursday March 26th, 2015

Attendees: Jay Greenfield, Dan Gillman, Larry Hoyle, Barry Radler, Ornulf Risnes, Steve McEachern

Jay walked through the thinking of where the current Process model is now at, and what had fed into the work so far. He pointed out that the model (and 3.1 generally) were based on our “traditional” model of questionnaires and datasets, but that now new datatypes are becoming commonplace and possibly dominant. Our recent work has largely been exploring these types.

Known cases we are now asked to support include:

Administrative data
Qualitative data
Experimental data

Jay pointed out that we need to take on board a new notion of lifecycle, or in other words, per Ornulf, there is more than one way to generate a datum. Dan and Jay both pointed out that in this “new world”, we have no clear paths to a datum. This is something that needs to be further fleshed out.

Dan’s comment: The logic for Questionnaire data is clear: question - observation - capture - datum. Other cases are less so. e.g. Derivation: generates data, but requires no question. Here the input is an existing datum.

Ornulf noted that a derivation has various characteristics: it has an input datum, a formula for the derivation, and a datum as an output.

Larry gave an example from a clinical psychologist in which a process is used to collect a combination of questions and observations, but the ultimate “thing” being recorded is actually the scale score as the datum. Barry noted that there are similar sections in MIDUS where the parts are not relevant, but it is the whole that matters.

Barry points out that the step between capture and datum (subsumed now within Observation and Process Step) is “hiding” a number of significant steps - but that we can probably draw on the strength of the process model to document this.

Jay considered a similar case of Computer Adaptive Testing which works from a battery of test questions to ask a set of increasingly difficult or easy questions, and that adapts based on previous responses. Dan points out that there are some similar cases in the survey community, and Barry gave a similar case of conjoint analysis in marketing, as did Jay in EHR.

It may therefore be appropriate to start digging into the process model to see if we can accommodate some of the above use cases using the current combination of Capture, DataDescription and Process.

Jay suggested that we should be exploring these in detail - and that it cannot be rushed. It would be useful therefore to now develop these use cases to test out the current version of the model version, to (a) assess the current objects and process model, and (b) determine what else needs to be included.

Suggested worked use cases:

Ornulf’s derivation process for RAIRD event data
Larry’s clinical psychology example
An administrative data example (Steve??)
Other suggestions??

Jay noted his work with Splunk here, where they are always aggregating and disaggregating from the datum level.Dan noted worries here about confidentiality in such a process. Jay also recognised this, but pointed out the access rights associated with each datum as one means to resolve this. Ornulf also had been addressing this solution in the RAIRD work, using statistical disclosure control on the end products.

Moving forward, it was agreed to take away these use cases, and start describing using the Capture/DataDescription/Process views. Example cases are given above, but it would be good to get additional cases of interest to the members of the group - particularly where group members are collaborating on cases. This work will require some extensive thinking, so the agreement was made to continue to work on these use cases, but to switch focus for our fortnightly meeting to the Physical Data Description.

Next meeting: Thursday 9 April. Time to be confirmed (due to Daylight savings changes in Europe and Aust/NZ)

Agenda will be to review and evaluate the current status of Physical Data Description. This will need to focus on:

The file description
The logical structure.

In preparation, it would be useful if team members could review the three pieces of work so far in this area:

The PhysicalDataDescription model developed by Ornulf, Chris, Justin and Achim in Dagstuhl: http://lion.ddialliance.org/package/newobjectsforphysicaldatadescription
The PHDD PDF vocabulary on the DDI Alliance website: http://www.ddialliance.org/Specification/RDF/PHDD
The SCOPE project document Dan has been working on with US agencies: https://dditools.atlassian.net/wiki/download/attachments/2588850/SCOPE%20-%20Metadata%20Element%20Set%20for%20Describing%20Variables%20-%20Updated.docx?api=v2

Expand

title	Meeting minutes 11 March 2015

Data Description Meeting 11/3/2015

Attendees: Steve McEachern (ADA, Australian National University), Larry Hoyle (IPSR, University of Kansas), Dan Gillman (BLS), Barry Radler (MIDUS, University of Wisconsin), Simon Lloyd (ABS), Ornulf Risnes (NSD)

We updated the progress since the last meeting, particularly the document Steve and Barry generated out of the "Linking..." presentation developed by Dan and Jay. This integrated model, bringing together the interface between Capture and DataDescription, is available here as a PDF, with the objects and relationships specified in the document available in the http://lion.ddialliance.org Drupal site.

Dan gave some initial comments on the model: What about those datums that are produced out of an observation that is not from a capture, e.g. a datum from a derived variable
Barry and Larry made the point that any observation is an outcome of a process - but that may not be generated by an instrument (e.g. generation of survey weights)
Complicating processes include: editing, computation, derivation, weighting
The term “observation” also alludes to originating from a physical source - where the above are not originating in the physical, while the machine generated processes
DDI 3.2 has a Generation as an output of producing a Datum from another machine data source - this might be a good existing option to draw upon
Capture and Generation would be sub-classes of a higher level class
Ornulf makes the point that this “first capture” versus later “derivations” may over-complicate the model - and may also create an artificial distinction
It may be the case that this distinction may be better defined within the Process group (as a “Processing Cascade”??)
The distinction between observation and generation would then arise when you determine where this arises in the processing cascade.
The class could also be a base class in the Conceptual view, an “UberDatum”

The general conclusion from the discussion is that the relationship between ProcessStep, Observation and Datum looks sound, but that the ProcessStep and Observation objects may need additional work in order to see if they are sub-classes of a broader type.

Thus the next meeting will explore further the requirements both Capture and DataDescription have for the Process model. In the interim, additional email discussion will continue around comments on the Capture-DataDescription link, building on Jay’s discussion of similar issues in HL7 and OpenEHR.

The provisional time for the meeting will be Thursday March 26 at 8.00PM Central European time. The GoToMeeting URL is:

https://global.gotomeeting.com/join/148887013

However given Jay’s existing work and his role with the Process model, which are the next step in our discussion, we will coordinate times around Jay’s availability if required.

...

Expand

title	March 17 meeting

2014-03-17 Meeting Minutes

Time:

15:00 CET

Meeting URL:

https://www3.gotomeeting.com/join/685990342

Agenda:

1) Status update. Where are we now with SimpleDataDescription? (ØR)

2) Clarify relationship between domain experts and modeler. Define role responsibilities, desired workflow in group (ØR, AW?)

Domain expert adds object descriptions and relationships

Modeler puts them into the overall model

Then iteration

What is the status of round trip?

Drupal to xmi to EA? Yes.

Is there machine actionable feedback into Drupal? No. It is possible but some work is required. It is not clear yet if there are resources for this task. Furthermore there are different positions on the issue if the roundtrip makes sense.

3) Identified issues with the current version (ØR/all)

a) Model is sparse on properties for InstanceVariable, RepresentedVariable, ConceptualVariable. Out of scope for this group?

Comments: These objects currently only exists in the SimpleDataDescription package. Discussion about GSIM/DDI 3.2 and who’s responsible for the “core variable objects”.

b) Do we need DataSerialisation (the physical counterpart of DataDescription)? DataDescription already relates to InstanceVariable, which relates to Field (column) in the RectangularDataFile. Because of this, a path exists from the Fields in the RectangularDataFile via InstanceVariable up to DataDescription and “TOFKAS”

c) DataSerialisation has no relationship to RectangularDataFile. If we decide to keep DataSerialisation, surely the relationshop to RectangularDataFile must be added.

4) TODO; Identify outstanding tasks (ØR/all)

Dan shares info on data.gov-Data dictionary
Dan shares a set of example data descriptions
Ørnulf pulls info from GSIM to produce candidate objects/properties for InstanceVariable, RepresentedVariables, ConceptualVariables
Larry shares findings/glossary for terms in extended attributes for SAS Enterprise Guide tool (below)
Ørnulf to suggest some “benchmark datasets” that can be used to document our work, and to “prove” that we are able to model a set of different data sets with our new model

Larry has been using a “difficult” training data set on tornados, and will share with this working group
Data file description is at http://www.spc.noaa.gov/wcm/data/SPC_severe_database_description.pdf
Data files are available at: Monthly Tornadoes Since 1950
Data files at: www.spc.noaa.gov/wcm/#jmc (scroll down to “Severe Weather Database Files (1950-2012)”)

Barry to flag potential issues from fieldwork with 3.2

Still a couple of months down the road

Ørnulf to harmonize minutes document and bring Larry’s notes in the right place
Ørnulf to try to arrange a meeting in April
Larry remembers to invite Ørnulf in case he’s needed for a virtual meeting during the NADDI sprint.

5) Assign responsibilities for outstanding tasks (ØR/all)

See above.

6) Plan milestones (based upon TODO-list, goals and availability) (ØR/all)

Overall milestone plan/timelines to be clarified during NADDI sprint. Thérèse Lalor (ABS) is currently the project manager for DDI4 - but only until July 2014.

Other notes:

...

Versions Compared

Old Version 130

New Version 131

Key

2014-03-17 Meeting Minutes

Agenda:

Page Comparison

Versions Compared

Old Version 130

New Version 131

Key

2014-03-17 Meeting Minutes

Agenda: