...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Info | ||
---|---|---|
| ||
Expand | ||
---|---|---|
| ||
Meeting minutes, March 30, 2017 Attendees: Larry Hoyle, Jay Greenfield and Steve McEachern Larry raised the question of the complexity of code lists Nodelist -> CodeList; CodeItem -> Designation, Designation → ... etc. There's a large number of classes to be completed for a variable. Wondered for example should a CodeItem just have a PROPERTY of a Code? Larry demonstrated the extent of description that is required to describe a single variable. Jay noted that the reusability is beneficial in various situations. Steve also noted that some of this complexity is also present in earlier DDI versions. The lack of reusability is also a problem for DDI-C (the same code lists then are repeated over and over). Sense from the group was that we should put the capacity for "trimming" the instance on the table and see how this works (probably in the tools, not in DDI-Views). Might be a recommendation for tools as to how to approach this. Next meeting: Thursday April 20, 1500 CEST Comparable times: Mannheim, Germany Thu, 20 Apr 2017 at 3:00 pm CEST |
Expand | ||
---|---|---|
| ||
Data Description meeting minutes - 09 March 2017 Attendees: Steve McEachern, Larry Hoyle, Dan Gillman Upcoming calendar: Today: Updates on action items from last meeting Noted in reviewing: 1. Proposal for the modelling group:
Dan suggested we may want to put Additivity alongside intendedDataType. In the end we need a more detailed characterisation of the variable - to allow an algorithm to determine what are the permissible/appropriate machine actionable operations on a particular variable. Need to discuss this further.
No place for this (which came out of 3.2).
Doesn't currently have a relationship to InstanceVariable? 5. InstanceVariable Has "measures" from two different things: Next meeting: March 23, 1400 CET |
...
Expand | ||
---|---|---|
| ||
Data Description meeting, 14 January 2016, 2100 CET Attendees: Barry Radler, Flavio Rizzolo, Dan Smith, Jay Greenfield, Ornulf Risnes, Steve McEachern, Dan Gillman (from 21.40 onwards) Apologies: Larry Hoyle There were three outstanding questions from the previous meeting designated for discussion - see previous meeting notes below. 1. Relationships between DataPoint and DataStructure It was agreed to remove the relationships between DataPoint and DataStructure
And then add relationships from DataStructure to InstanceVariable - the same two relationships above Questions on this point:
Dan’s argument: DataRecord and DataStructure store data, but Viewpoint stores relationships Flavio: DataStructure has homogeneous DataRecords only (confirmed by Ornulf) THUS - need to add to DataStructure definition that it is a homogeneous set of DataRecords. Agreed that the following needs to be added to the model documentation:
Further questions: Dan: How do we associated specific Viewpoints with the DataStructure? Jay: Can a Viewpoint describe, for example, an RDF triple? Dan suggests that this might be possible to do with the use of Roles (e.g. Predicate is defined as an Identifier role for an IV) Ornulf noted that some of the uses here are documented in the paper from he and Dan authored at the Dagstuhl sprint https://docs.google.com/document/d/1-vxWdastNsTWMf8qlR35wj1128FNSX-4YBrA_MJBaLk/edit Different Viewpoints could be layered on top of the DataRecord. You also don’t necessarily need to use the Viewpoint. Dan S. noted than that three layers that can be used:
You will always need to use the DataStructure, but the other two will be optional DataStructure will therefore have the following relationships:
2. ORDERING: Agreed that Ordering of DataRecords in DataStructure should be possible but OPTIONAL. Ordering of InstanceVariables in a DataStructure still needs to be clarified. 3. Usecases This point wasn't covered directly in the discussion. Agreed that there is a need for testing usecases against the model now, but need to finalise the clean-up of Lion (per Wendy Thomas's review - see minutes below). Agreed therefore that Flavio would update Lion/Drupal, and we would have a special meeting Monday Jan 25 to review this, ahead of the regular meeting on Jan 28. Steve, Jay and Flavio will convene the review meeting, with others welcome if available. Actions:
Next meeting(s): a) Review meeting Monday Jan 25th, time TBC. b) Regular meeting Thursday Jan 28th, 10PM CET, GoToMeeting: https://global.gotomeeting.com/join/148887013
(Note that meeting time will return to CET 10pm for next regular meeting.) |
...
Expand | ||
---|---|---|
| ||
Meeting minutes 17/12/2015 Attendees: Dan Gillman, Jay Greenfield, Larry Hoyle, Steve McEachern, Barry Radler, Ornulf Risnes, Chris Seymour, Dan Smith Dan Gillman opened with a review of the PPT he provided earlier this week on “Tracking Datums”. Key points in Dan’s proposal:
Jay: What about the collection of copies of the Datum? What is this thing (if not Datum)? Larry: How do we identify the particular Datum that is put into the DataPointInstance Jay: asking does Dan want a class to indicate that all of the Datums represented the same conceptual thing. Dan agreed. Ornulf: if we have access to the Variable Cascade, can we infer the relevant concepts associated with the Datum? Ornulf: What does this add that we don’t already have?
Jay’s interpretation was that the RHS of Dan’s model could improve the model, the LHS is more complicated. Suggests that there are two roads:
Dan: aim of his model is trying to associate a copy of a Datum and an InstanceVariable into a DataPointInstance. Ornulf: not comfortable with where we are at. He argues that we CAN re-use DataPoints, and that we can track DataPoints (he is currently doing this in RAIRD). Dan asks can Ornulf reuse STRUCTURES. Jay suggests that what Ornulf is doing is actually using DataPointInstance (but naming it DataPoint, as is currently in the model). The question here is fundamentally about reusability. Larry: Is what is "in" the DataPointInstance a Signifier? And is DataPoint the LOGICAL and DataPointInstance the PHYSICAL? Dan: key argument is that we have the concept we want to represent (e.g. the NUMERAL five) and a series of strings that signify the concept (e.g. different strings of 5, IV, ...)
Dan: what isn’t currently covered is the fact that DataPoints can be RE-USED. Ornulf argued that he thinks that’s covered, but Dan's position is that we don’t yet have the “empty bin”. Dan S./Larry: are we talking about the difference between a logical and a physical, between empty and populated, ...? (Dan G. left the meeting at this point) Dan S. suggests that everything that Dan G. is covering is represented in the current version of the model in Lion - in particular, we can address a DataPoint from the InstanceVariable and DataRecord HOWEVER, Dan S. did have a concern that Ordering in the DataStructure is ordering DataPoints. Dan S. suggests that ordering should be of InstanceVariables. Dan S. argued that DataStructure relationship should be to InstanceVariables rather than DataPoints. Larry asks whether the relationship should be between the DataRecord and InstanceVariables. Dan notes that if the Record complies with the Structure, then that isn’t necessary. Questions for discussion at the next meeting:
Next meeting: January 14, 2016. GoToMeeting: https://global.gotomeeting.com/join/148887013 Proposed time is ONE HOUR EARLIER - 2100 CET. Steve to poll group members about this. NOTE ALSO NO MEETING DECEMBER 31 |
...
Expand | ||
---|---|---|
| ||
HOW FAR DO WE WANT TO GO WITH WHAT WE DESCRIBE? Jay has put together a deck and has a proposal He is modifying GSIM model of “data set” First thing that’s interesting is that the way GSIM represents attributes, it doesn’t give them a possibility of giving them a structure. We’d want to modify it so it could have a structure. This would be a hook to enter what Larry and Arofan are doing. Discussion took place about what defines 1NF/3NF in the GSIM model and Jay’s propsal. But does it matter or can the terms be changed for description? The description that Jay proposed makes sense, but terms should be changed to avoid NF’s. Attributes need to be worked into the GSIM model as they are variables. There are variables in the attribute sets. LARRY - In DDI do we want to model a datum as a collection of variables or a single variable? DAN – it’s a single. LARRY – but then Ornulf describes a datum as a collection of variables So what are the terms to be used if we’re calling a datum a single variable?E Datum Data Structure
Coming back to Jay’s stuff this morning. 2 different types, logical record and the basic idea of key value pair (reordering above)
Would the key-value pair be possibly triples? Graph data? Where are we in relation to the work done yesterday? We have a basic structure to then describe a CSV file. DAN - What could be called a key-value triple which contains a variable (attribute), unit (ID), value (measure). (There are parallels between this and the datum structure.) So this a the fundamental thing. Let’s use that to define a record, and from that define a CSV. Record is an ordered set of these key-value triples (“kvipple”) that share the same unit. Larry making a proposal We’ve got this record which has 3 collections associated with it: ID, Measures, Attributes. Record, ID, Measures, and Attributes are all collections. Then we want to define a structure of records. That can be instantiated as a dataset RecordSet is a set of Records (a sub-class of collection) DataStore store of a RecordSet STEVE - Can we describe a CSV at this point? Moving from RecordSet to DataStore we move from logical to physical. We have separated the logical and physical forms A CSV is one type of DataStore, and all the logical parts are in the RecordSet. Fixed Format is also another of DataStore. What does a Key-Value Triple option look like? How can this work with aggregated data. GSIM didn’t try to tackle them all under one structure; are we trying to do it with one? We can use the basic model of building this up, but we have to interpret it differently and have different relationships associated with it in the case of aggregates. We need to solve the problem of dimensional data. Take the combination of the values of each of the dimensions; every combination defines a different cell. Applied to the unit type in the micro data, itself defines an aggregate unit. Record: Cell Unit Type (e.g. “people”) Dimensions (e.g. “age”, “sex”) Measure (e.g. “income”) Key: 40 y.o. male plumbers (1. . n components) The component could be represented by variables Each kvipple is a cell. And every cell is a record. The unit incorporates the key. Are we losing the dimensions? Does the model work? The only thing that’s really changing is the idea that the unit is going from one kind of object to an abstract collection object. It’s the set as a completed set, not as the individual element within – is the unit. The dimension isn’t lost; it’s a combination of aggregated variables. Unit + dimensions+ variable + value = Key The unit is shared by the entire cube. It describes the characteristics of the entire population. (working with census data) For the microdata dimensions are constant (e.g. person). For the macrodata the unit is constant. Key is M,40. Variable is income. Value is 27,000 Is the unit the cube or the combination of things in the key. What is the unit? In a microdota case each cell is a record. The unit is identified by the key; it’s the interpretation of each cell. Dimesional data takeaways:
Units either by groups or individual they mean different things. The unit is dependent on the key. What's the unit of analysis? The unit of the cube or the unit of the cell? What do we want to do with it? The unit question - the answer lies in where we attach more information. We want to put in rules for putting together different slices to put together the RecordSet in the unit. We need to say what the "thing" is before we put everything together. Need to look at how datum is described from the point of view of the variables. The following email and links were provided by Ornulf following the call: Regarding the question of relations; we've lately come across some interesting thinking in what seems to be an alternative (and more forgiving) way of Data Warehoursing; Data Vault Modeling: |
Expand | ||
---|---|---|
| ||
In seeking to start creating a simple logical structure, we began by looking at the the 4 objects that had been created during Dagstuhl: DataPoint, DataStructure, DataStore, and DataStoreSummary. Also Dan Gillman began brainstorming a model of DataStructure along with the group. Review of the DataStructure led to discussion if any parts of it needed to be reviewed and redesigned. A DataStructure is an ordered set of DataPoints (a record). And a RecordSet is a collection of DataStructures (a table). The discussion raised the issue of types of records and sequence of records. Question – do we want to describe a very simple CSV (all DataPoints in a column are the same variable), or a more complex type e.g. a Household, Person structure with record type variables and sequence variables? If all records do not contain the same sequence of variables then we need to describe record types and sequences. |
...
Expand | ||
---|---|---|
| ||
DataDescription Meeting Minutes: Thursday March 26th, 2015 Attendees: Jay Greenfield, Dan Gillman, Larry Hoyle, Barry Radler, Ornulf Risnes, Steve McEachern Jay walked through the thinking of where the current Process model is now at, and what had fed into the work so far. He pointed out that the model (and 3.1 generally) were based on our “traditional” model of questionnaires and datasets, but that now new datatypes are becoming commonplace and possibly dominant. Our recent work has largely been exploring these types. Known cases we are now asked to support include:
Jay pointed out that we need to take on board a new notion of lifecycle, or in other words, per Ornulf, there is more than one way to generate a datum. Dan and Jay both pointed out that in this “new world”, we have no clear paths to a datum. This is something that needs to be further fleshed out. Dan’s comment: The logic for Questionnaire data is clear: question - observation - capture - datum. Other cases are less so. e.g. Derivation: generates data, but requires no question. Here the input is an existing datum. Ornulf noted that a derivation has various characteristics: it has an input datum, a formula for the derivation, and a datum as an output. Larry gave an example from a clinical psychologist in which a process is used to collect a combination of questions and observations, but the ultimate “thing” being recorded is actually the scale score as the datum. Barry noted that there are similar sections in MIDUS where the parts are not relevant, but it is the whole that matters. Barry points out that the step between capture and datum (subsumed now within Observation and Process Step) is “hiding” a number of significant steps - but that we can probably draw on the strength of the process model to document this. Jay considered a similar case of Computer Adaptive Testing which works from a battery of test questions to ask a set of increasingly difficult or easy questions, and that adapts based on previous responses. Dan points out that there are some similar cases in the survey community, and Barry gave a similar case of conjoint analysis in marketing, as did Jay in EHR. It may therefore be appropriate to start digging into the process model to see if we can accommodate some of the above use cases using the current combination of Capture, DataDescription and Process. Jay suggested that we should be exploring these in detail - and that it cannot be rushed. It would be useful therefore to now develop these use cases to test out the current version of the model version, to (a) assess the current objects and process model, and (b) determine what else needs to be included. Suggested worked use cases:
Jay noted his work with Splunk here, where they are always aggregating and disaggregating from the datum level.Dan noted worries here about confidentiality in such a process. Jay also recognised this, but pointed out the access rights associated with each datum as one means to resolve this. Ornulf also had been addressing this solution in the RAIRD work, using statistical disclosure control on the end products. Moving forward, it was agreed to take away these use cases, and start describing using the Capture/DataDescription/Process views. Example cases are given above, but it would be good to get additional cases of interest to the members of the group - particularly where group members are collaborating on cases. This work will require some extensive thinking, so the agreement was made to continue to work on these use cases, but to switch focus for our fortnightly meeting to the Physical Data Description. Next meeting: Thursday 9 April. Time to be confirmed (due to Daylight savings changes in Europe and Aust/NZ) Agenda will be to review and evaluate the current status of Physical Data Description. This will need to focus on:
In preparation, it would be useful if team members could review the three pieces of work so far in this area:
|
Expand | ||
---|---|---|
| ||
Data Description Meeting 11/3/2015 Attendees: Steve McEachern (ADA, Australian National University), Larry Hoyle (IPSR, University of Kansas), Dan Gillman (BLS), Barry Radler (MIDUS, University of Wisconsin), Simon Lloyd (ABS), Ornulf Risnes (NSD) We updated the progress since the last meeting, particularly the document Steve and Barry generated out of the "Linking..." presentation developed by Dan and Jay. This integrated model, bringing together the interface between Capture and DataDescription, is available here as a PDF, with the objects and relationships specified in the document available in the http://lion.ddialliance.org Drupal site.
The general conclusion from the discussion is that the relationship between ProcessStep, Observation and Datum looks sound, but that the ProcessStep and Observation objects may need additional work in order to see if they are sub-classes of a broader type. Thus the next meeting will explore further the requirements both Capture and DataDescription have for the Process model. In the interim, additional email discussion will continue around comments on the Capture-DataDescription link, building on Jay’s discussion of similar issues in HL7 and OpenEHR. The provisional time for the meeting will be Thursday March 26 at 8.00PM Central European time. The GoToMeeting URL is: https://global.gotomeeting.com/join/148887013 However given Jay’s existing work and his role with the Process model, which are the next step in our discussion, we will coordinate times around Jay’s availability if required. |
...
Expand | ||
---|---|---|
| ||
2014-03-17 Meeting MinutesTime: 15:00 CET
Meeting URL: https://www3.gotomeeting.com/join/685990342
Agenda:1) Status update. Where are we now with SimpleDataDescription? (ØR)
2) Clarify relationship between domain experts and modeler. Define role responsibilities, desired workflow in group (ØR, AW?)
Domain expert adds object descriptions and relationships Modeler puts them into the overall model Then iteration What is the status of round trip? Drupal to xmi to EA? Yes. Is there machine actionable feedback into Drupal? No. It is possible but some work is required. It is not clear yet if there are resources for this task. Furthermore there are different positions on the issue if the roundtrip makes sense.
3) Identified issues with the current version (ØR/all) a) Model is sparse on properties for InstanceVariable, RepresentedVariable, ConceptualVariable. Out of scope for this group? Comments: These objects currently only exists in the SimpleDataDescription package. Discussion about GSIM/DDI 3.2 and who’s responsible for the “core variable objects”. b) Do we need DataSerialisation (the physical counterpart of DataDescription)? DataDescription already relates to InstanceVariable, which relates to Field (column) in the RectangularDataFile. Because of this, a path exists from the Fields in the RectangularDataFile via InstanceVariable up to DataDescription and “TOFKAS” c) DataSerialisation has no relationship to RectangularDataFile. If we decide to keep DataSerialisation, surely the relationshop to RectangularDataFile must be added.
4) TODO; Identify outstanding tasks (ØR/all)
See above.
6) Plan milestones (based upon TODO-list, goals and availability) (ØR/all) Overall milestone plan/timelines to be clarified during NADDI sprint. Thérèse Lalor (ABS) is currently the project manager for DDI4 - but only until July 2014.
Other notes: |
...