Methodology Team page |
ATTENDEES: Jay and Dan I had an interesting conversation with Dan Gillman the other day about the Methodology model that pertains to Workflows. Basically I have been looking for a way to describe a Workflow "with some specificity" without having to go through the pain of Conditionally Sequencing a series of WorkflowSteps. Of course, my purpose was not to assist a software agent to execute a Workflow. Instead it was to give structure to the methodology section of a paper or thesis a human agent could use either to evaluate or replicate a result. Dan characterized Methodology as a triple consisting of a Design, an Algorithm and a Process. To document a Methodology Dan indicated we might "use any one or two or all three elements in the triple. He described an Algorithm as the "formalization" of a Design and a Process as the "instantiation" of an Algorithm. Two use cases in connection with Methodology and the triple came to mind. In one use case knowledge was sufficiently advanced that there were several algorithms available and in use in connection with a methodology. Sort came to mind. In a second use case knowledge was not sufficiently advanced, there was no well known algorithm and we were forced into experimentation using ProcessSteps, ControlConstructs and the like. Right now Workflows has a Workflow which "realizes" a Process. Otherwise it has a Design, an Algorithm and "contains" a WorkflowSequence. So, indeed, the Workflow class walks and talks like a Methodology. If Workflow were to "realize" a Methodology (instead of Process), we could short-circuit our use of Workflows and just describe its Workflow limiting ourselves to two elements of the methodology triple only: design and algorithm. OK. So how do we express an algorithm? Wikipedia has thought about this a lot: Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables (processed by interpreters). Natural language expressions of algorithms tend to be verbose and ambiguous, and are rarely used for complex or technical algorithms. Pseudocode, flowcharts, drakon-charts and control tables are structured ways to express algorithms that avoid many of the ambiguities common in natural language statements. Programming languages are primarily intended for expressing algorithms in a form that can be executed by a computer, but are often used as a way to define or document algorithms. There is a wide variety of representations possible and one can express a given Turing machine program as a sequence of machine tables (see more at finite state machine, state transition table and control table), as flowcharts and drakon-charts (see more at state diagram), or as a form of rudimentary machine code or assembly code called "sets of quadruples" (see more at Turing machine). Representations of algorithms can be classed into three accepted levels of Turing machine description:
Based on the thinking Wikipedia has done, for certain purposes we might use any of the three description levels above in place of a Process description using the Workflows package and the ProcessPattern it realizes apart from just one class -- Workflow -- which implements a Methodology. In this instance Workflow would NOT be connected with WorkflowSequence and, as a consequence, it would NOT contain any WorkflowSteps. Instead, using one style of expression or another, these steps would be described in an Algorithm. An algorithm is not "machine ready". This statement, however, deserves more attention. Consider the following: Minsky: "But we will also maintain, with Turing . . . that any procedure which could "naturally" be called effective, can in fact be realized by a (simple) machine. Although this may seem extreme, the arguments . . . in its favor are hard to refute". Gurevich: "...Turing's informal argument in favor of his thesis justifies a stronger thesis: every algorithm can be simulated by a Turing machine ... according to Savage [1987], an algorithm is a computational process defined by a Turing machine". I would only dare to add that a Turing machine is more a heuristic device (an abstract machine through we learn about computers) and less an actual one. This is in line with Dan's view that an algorithm is in the final analysis NOT an implementation. Implementation is the domain of a Process. By way of example consider "pseudocode". Pseudocode is not machine ready. Neither is natural language, flow charts or control tables. All require an "interpreter". That interpreter could be a machine but that machine would be the product of machine learning orchestrated by a human. Once more: none of these algorithms are machine ready. All require interpretation. |
ATTENDEES: Larry, Jay, Michelle, Steve
View - need to get the workflow and process pattern correct. Workflow classes for Qualitative, coding and segmentation
Quick DD and FHIR discussion TODO:
|
ATTENDEES: Dan G., Wendy, Jay, Larry, Barry, Steve Sampling Looking at Sampling Methodology, SamplingPreMethodologyPattern, Sampling Design Powerpoint, and SDI Sampling work
Questions regarding roles of specific classes:
NEXT MEETINGS AGENDA Review SamplingMethodology prior to meeting:
TO DO: Need to write a document of:
The Wiki page needs to be updated to relay current thinking. |
ATTENDEES: Dan G., Wendy, Jay, Larry, Michelle Agenda / Minutes:
Discussion regarding whether the pattern is the correct way to go - looking into the future. Dan does have reservations regarding the MethodologyPattern so the modeling perspective may not necessarily reflect future methodologies. Wendy will include a question with the Q2 release asking whether folks believe that the pattern is the best way to go for Methodology. 2. Consider dropping the Methodologies package in line with Flavio’s comment in the attachment
3. Consider Larry’s comment that some things missing in Lion packages that we should have: Specific methodologies derived from the Methodology pattern (e.g. sampling and weighting) (Michelle)
4. Get an update from Larry on how we might use the MethodologyPattern in the StudyRelated package (new)
Next Steps:
Next Meeting: Sept 19, 2016 - I'll email to confirm |
ATTENDEES: Larry, Flavio, Wendy, Jay, Michelle
Next Steps:
Follow-up:
Next Meeting: August 22, 2016 -
|
ATTENDEES: Larry, Steve, Flavio, Wendy, Michelle
Next Steps:
Next Meeting: July 25, 2016 |
ATTENDEES: Wendy, Jay, Michelle
Michelle will contact team members to find a time that works best for everyone. |
ATTENDEES: Wendy, Larry, Jay, Barry (first 10 minutes) Methodology: We talked about changes made to the Methodology Model during the Edmonton Sprint and what the implications were for instantiating the model either as is or as a pattern. If it is instantiated "as is" then it would need some additional samantics to indicate the type of Desgin/Algorithm/Process. If turned into a pattern which classes should be abstract and how would the pattern be realized by say the Sampling Method. Some points brought up in discussion:
ACTION: See how Sampling Method would look given the changes in the Methodology model. (Jay will play with this) It would be useful to have both an example of how to realize the Pattern/instantiate the model AND how this could be trimmed in view for various levels of information (general description, detailed processing, relaying information to the data user). There looks like there would be work here for the sprint in the context of the Codebook group and also some preparation work to allow Codebook group to explore possibilities for using the Methodology model effectively |
ATTENDEES: Wendy, Larry, Dan G., Steve, Natalja Regrets: Flavio, Barry, Michelle AGENDA Discussed documents sent out by Wendy (model and discussion) We started by looking at the changes resulting from making the link between Methodology and Core Process model clearer and Wendy's understanding or misunderstanding of the relationship between Method and Process and Method and Design
What is a Method? Is it actually an Algorithm and if so where should it fit in the model?
Further discussion of Algorithm
Where we got to:
For next week:
|
Attendees: Dan Gillman, Larry Hoyle, Steve McEachern, Barry Radler, Wendy Thomas, Michelle Edwards
Design models:
Agenda for 21 March 2016:
|
Minutes for Methodology team meeting, 8 February, 2016 Attendees: Larry Hoyle, Steve McEachern, Jay Greenfield, Dan Gillman, Michelle Edwards, Barry Radler Regrets: Dan's Coding Design
Codebook:
Tasks for next meeting (Monday, February 22, 2016):
|
Minutes for Methodology team meeting, 25 January, 2016 |
Minutes for Methodology team meeting, 11 January, 2016 Attendees: Larry Hoyle, Steve McEachern, Jay Greenfield, Dan Gillman, Michelle Edwards, Barry Radler Regrets: Flavio Rizzolo, Marcel Hebing, Natalja Menold
Next meeting: 25 January 2016 Agenda:
|
Minutes for Methodology team meeting, 8 December, 2015 Attendees: Larry Hoyle, Steve McEachern, Jay Greenfield, Dan Gillman, Michelle Edwards, Barry Radler Regrets: Flavio Rizzolo, Marcel Hebing, Natalja Menold
Life Cycle and Methodology
Different designs to examine and volunteers:
Designs to investigate:
|
Attendees: Larry Hoyle, Barry Radler, Wendy Thomas, Michelle Edwards Regrets: Dan Gillman, Flavio Rizzolo, Marcel Hebing, Natalja Menold, Jay Greenfield, Anita Rocher, Steve McEachern
We did talk about the Terms of this group - there currently is a lack of clear direction and terms of reference for this working group. It's been around in different iterations for a while, but it is time to come up with a clear mission and goal for the group. Michelle will review all documentation available for this group and start a Terms document to circulate to the team for additions, deletions, etc Michelle will also review documentation to create a doc that highlights what we have accomplished and where we are going - this would be great information for the upcoming Dagstuhl workshop. Will send around for additions, deletions, and comments. Barry also talked about the Data Capture group and thought it should be a smaller "view" within this group. It fits between Data Description and Methodology. Meeting adjourned |
Attendees: Dan Gillman, Barry Radler, Anita Rocha, Jay Greenfield, Flavio Rizzola, Larry Hoyle, Wendy Thomas, Steve McEachern (joined in later), Michelle Edwards Apologies: Marcel Hebing This was our first meeting with Michelle as Chair. Primary goals of this meeting were to review document of proposed goals for this group and set out a work plan. A quick review of the document linked here, lead to a brief discussion over the definition and what was meant by the term "design model". Examples that have been developed in the past include: sampling, questionnaire design, weighting, and paradata. The "design model" provides us with the basis to develop the models to be added to the Methodology Model. The team agreed to contribute to alist posted here of design models they feel should be included in our work. Dan provided a great list as a starting point on the call. It was noted by Barry and supported by others that we need to think beyond survey designs. Larry reminded us that we also need to think about analysis designs. Our discussion continued regarding some of the design models we identified at the Minneapolis. Linkage between Methodology team and Data Capture? This is still a question - relates to a larger question for the Moving Forward project. Unsure whether methodology will be referring to the Process model - are we using is directly? There is a pattern here! The question should be is there a pattern of usage. How will people use the pattern or rather let us show you how to use this model for this particular case - Intersect point. Tasks to keep the work going:
Future tasks:
Next meeting is set for August 5 at 6pm Eastern. We will review this meeting time in September - but for the next month we will maintain this time. |
Minutes for Methodology team meeting, 17 June 2015 Attendees for the meeting: Michelle Edwards, Jay Greenfield, Steve McEachern, Michelle Edwards, Flavio Rizzolo, Larry Hoyle, Wendy Thomas Apologies: Dan Gillman, Barry Radler, Anita Rocha This was the first GoToMeeting of the Methodology gropu, and was intended to establish a regular working group for the Methodology content in DDI. Steve Introduced the purpose of the meeting and set out the initial agenda:
Michelle Edwards (CISER) agreed to take on the chairing of the Methodology group. Steve chaired the first meeting with Michelle to take over in future. Steve began with a discussion of the outputs of the Minneapolis sprint, in particular the discussions with the MPC group. Steve then presented the work done by Steve, Anita and Michelle on mapping out the existing DDI-C against the current DDI4 output (see attached images - 1, 2, 3). Wendy noted that the current content of DDI3.2 also largely imported the DDI-C content, and that it did need significant revision and modernisation, and potential deprecation of some content to bring it up to current practice. It was discussed that this breakdown might form part of the initial group workplan. Wendy also noted that we have three broad areas for each type of method that we may want to model:
Wendy also suggested that Analysis might want to be introduced into this methods breakdown. Jay Greenfield then presented the work he had done on introducing the model into Drupal (http://lion.ddialliance.org/package/methodology). In doing so, he also introduced some new objects (including Method, Goal, BusinessFunction and Precondition) to enable consistency with BPEL/BPMN, and discussed the capacity for use of two swimlanes (in BPMN terms) for managing workflows in one and managing data in the other. There was general agreement that this approach would be suitable, and could potentially enable both machine-driven processing, and historial process description. There was some final discussion about the Usage/Unit/Variable/Guidance section of the "ubermodel". The basis of this model originated with the Weighting work coming from the SDI team that is for discussion in DDI3.3. (Dan, Steve, Wendy and Anita all contributed to this team). Jay raised concerns that there may be some challenges with implementing usages that he would like to explore further. He also noted that there are some strong parallels with the DatumStructure developed by the DataDescription group that may be able to be leveraged in this Usage modelling. The group agreed that this would be the starting point for the next meeting. To conclude, the group agreed to convene on a bi-weekly basis. Michelle as new chair will organise a regular meeting time for the group, and convene the next meeting at a time to be determined. |
Steve gave an overview of what has developed with the Methodology Model. In turn Barry described what has being going on with Instrument. ConceptualInstrument is a design of some kind of data capture, which then is very similar to Design in the Methodology Model. There may be a lot of interplay between Data Capture and the Methodology Model. One possibility is that measurement turns into something more abstract, so "question" is a type of measurement, etc. INTERPLAY between these models: ConceptualInstrument = Design ImplementedInstrument = Process Rationale isn't in Data Capture, but that's fine because it's about why you're using the instrument that's been designed. 2 things for Monday:
Fundamentally DataCapture and the Methodology Model are very similar. |
Should the Process box be broken down? Jay proposed changing the name to "Process Step" that could loop back on itself and could have multiple Process Steps. A Process Step would lead to a Sequence and then more Process Steps. Some users may want that. But would everyone? Would it seem like that they would be required to fill it all out? Could "Process Step" somehow be optional? Is it up to the user to expand the "Process" down to whatever level they want? This may be more of a documentation and marketing issue. We should perhaps put it out giving an example of the higher level "Process" example, and then also give example and option of the analytical level of "Process Steps". Can we break out the "Design" and "Process" boxes if we need to and not if we don't need to? If we want to break out Design, we'd really have to give guidance on how to build a model. But Designs are so method-specific. We could put an extension point from Design, and then people could create the specific Design for whatever Methodology and then attach it. The key is that we need an exit point at the level needed by the user. "Process" may have a repeatable "Process Step" attached to it that allows users to go to the level of detail they desire (see the model in Drupal). Action Item:
Conclusion:
|
Where are we? And what do we need to get done? What are the plans for the Harmonization and Confidentialization? The basic approach that we are trying to develop is a framework for the description of the process of creating and applying a particularly methodology – in this case, harmonization and confidentialization. As such, what we would to walk through tomorrow is to try to articulate each of these the basic steps used in your process. What we would like to do is to:
It may be that you have some of these processes already compiled into procedure documents or websites. You may also have a general description of the overall design of a methodological process. These would be useful documents to walk through if you have them available. If not, we will use the walk-through to step through and describe (briefly!) what those process steps include. We'll then take a look at all the processes and see if there's any commonality. Meeting with MPC to get a description of Harmonization and Confidentialization Guests: Lara Cleveland, Patt Kelly-Hall, Miriam King How do you say what you intend to do? How do you say what you did? We want to understand different methodological processes and what they go through to see if our model holds up. We want to look at the Design, Process, and Rationale of these methods. In Harmonization and Confidentialization are there other boxes we need to add on to our model? At the beginning of MPC they took a sample of the 1880 census and then took another one and made the codes the same to simplify. When MPC is harmonizing, it's not the same things. Also, they deal with different languages. Everything is made by people, so there can be errors and a lot of variation There were principles of what Data Integration IPUMS was going to do, which may be part of design of how they harmonized information. The process:
The integration of this on the methodology side, is understanding that it's the output of one thing and then the input of the next thing. So we're interested in what the processes are to go from output to input.
There are multiple version of integration. So the model would need to be able to fit those into the framework. The set of activities: Input Material > Pre-processing > Standardization > Integration Confidentiality feels more compact and concrete: IPUMSI process:
There are versions and notes if there are major changes during between the annual version. Do MPC provide weights? If depositor has drawn sample and weights, it goes through the system, if not MPC draws them. MPC provides them with the syntax to correct the weights.
(Images of board so far: Image 1, Image 2, Image 3)
Is this intentional design or accumulated process? There are some design principles. |
GSBPM/GLBPM - Gives you all the process steps three levels down How far down the rabbit hole do you go? That's the users choice. We have to decide how far down we support. What tool do we capture the DDI process?Could it be done all in BPMN and then reference the relevant DDI. The level of Process Steps may be where the DDI process is more useful. There may be a worry that the complexity may be required by the users even though they could choose the level of complexity. Would that mean taking the Process Model out of DDI? At the end of looking at all the information about Weighting in the morning, when describing the process it could be done at different levels for different audiences. For some reasons (e.g. provenance) there's need to get into some pretty hairy details. We need something that's either inside the standard or can talk to the standard. What are we proposing the DDI Process Model for? Instrument, Prescriptive Process (proposed), Historical Process (proposed), Methodology. If there are already lots of models out there, what are we providing here that's different? Something that natively understands DDI metadata. We don't want to duplicate and compete, but we do want to provide what are users need that's native inside the system. The idea was to always interface with BPMN in the Process Model. There are issues when it comes to calculations, new language being developed, etc. There was a clearer sense of a region in which we would operate and a region in which BPMN would operate. There's a document from the Washington DDI conference that visit this. How far down the rabbit hole? Not producing for new language, or calculations, but being able to incorporate down there along with having the inputs and outputs. Let's address who will use this:
We need to produce something that's not a mandatory part of the process. So we can go down a variety of levels. There is a need for a Process Model; but should it be coded in DDI. For Provenance and Preservation purposes, having the ability to put that stuff in, so that down the road when it's archived it's in a way that DDI will understand. The Methodology model wouldn't say how to do coding, weighting, etc., but says what parts are needed for those things. We're trying to develop a robust and inclusive Methodology Model that will show the basic necessary objects, and then give extensions points for people to drill down into the details of their methods. Could we explode the Design Box in the model? If no, then that's an extension point for people creating specific methodologies. Conclusions:
Can we describe how we do that extension point? Images from later afternoon: |
The work that we're looking at at the moment is how far we can go with this rationale model. Yesterday we applied this model to weighting. We tried to apply this to sampling. The recognition was that "x" is an outcome. For sampling it's a selection since sampling is the design and process part of the model. "Variable" - is it necessary? Or should it have a different cardinality of 0..*? Is this single-stage or multi-stage? It does account for the multi-stage and it's executed more at the "design" part. The description is the overall process is in the Design, but the execution is in the Process by a set of process steps. The Sampling Model would be a description of the Design. The questions for today and tomorrow are to break out the Design and Process boxes. Could you do something similar to fit in the Methodological Model for Coding and Weighting? Concentrating on the Design box. Can we standardize the Design box for feeding into the Process box? There may be overlap with Instrument. Looking at Arofan and Jay's document from Dagstuhl. Here we're saying what we're going to in a method for a Process. This may be useful to look at in the InputType, InputInstance, Citation, OutputInstance, etc. This is very similar to GSIM (in the central column). We should really look at GSIM. One takeaway is that you can describe a method at any level of granularity. You could have a process step design that says you're doing a multi-step thing, and then the process model wouldn't go down and describe each stage specifically but the entire process of stages. In the document there's an early example that looks like a higher level example of Process that each box could be a Process Step and is similar to GLPPM. Let's take a quick look at GSIM and Instrument and see if we're getting similar things - as Barry is seeing a lot of overlap in Instrument. Looking at the GSIM Process
Can we describe methodologies in terms of design in this way or not? Can we describe weighting and coding in a basic input and output approach? Barry has already used Instrument to create what is a Methodological Design.
Let's map out a Coding and Weighting Process. CODING: Develop "index" mapping phrases to codes Develop Rules for disambiguation (where would a body of knowledge go in DDI4) Rules for difficult cases Identify people roles:
Manual process inform machine learning process Automated system first then manual Update of machine learning database QA on sample of coded data Identification of training needs Conclusion: This can be done at a high level, but a real question is whether someone would want to; this is quite difficult. Narrative description of a Coding Process Model Pre-Inputs
Input a case (input = piece of text) Check for automated system
Automated system
Manual system input indexes
Rules check
Reference Material
QC System (inputs: case, code assigned, source flag)
|
Looking at some of the history and purpose of this group. Reviewing the Methodology Model to look at how it operates. We have our example case, can we fit it and other methodology approaches into this framework? This is where we could work from. One of the parts is separating process from the output - which is "x". Sampling Plan Model explained by Dan [see Sampling slide deck]. What's the purpose of have strata outside of frame, but a cluster would be in the frame? It's useful to have the strata outside to tell what it is. One thing that's not here is a full description of a systematic sample. How can the Sample Model fit into the Methodology Model?
What if "x" is Selection?
Harmonization would be a nice check of this model. We need to give direction to other teams to see if we can fit such things into the model. If we sat down with someone (e.g. harmonization?), what would we ask them?
Where we got to:
What does the process look like? How much detail we want in Process may be open for discussion. GSIM did produce a Process model that we could incorporate. We have a Process model that at this level becomes Process Steps. There are two processes. The process of creating and the process of implementing - we want the process of implementing the design. There is the Process Model on Drupal - Jay gave a brief description. Steve's question - Can we describe in our design and rational the appropriate inputs? Is this a generic enough model to implement? Can we describe the process? More work definitely needs to be done to execute it. There are decisions about design and process that are outside the actual description of what you're doing... why? We're going to look at Harmonization and Confidentialization on Thursday. |
Notes from the discussion clarifying the intent and coverage of this group |