TC Meeting Minutes 2020-2021
- Wendy Thomas
- Jon Johnson
- Darren Bell
- Mary Vardigan (Unlicensed)
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Flavio, Larry, Ingo
regrets: Darren, Oliver, Barry
Ingo contact from Scientific Board
- Improve contacts with working groups so that they have a better sense of the ongoing activities of the working group
- Help report and facilitate work of the group, seek help where needed (resources etc.)
- TC felt that it would be helpful to have that outside/neutral reporter
Technical/Implementer focused documentation
- Possible working group
- Set up a couple of implementer meetings for brainstorming
- Prepare materials - what kinds of things people finding hardest to understand
- useful easy to generate information
- ACTION: Prepare something for early January
In-person meeting update
- Jared stated that we will need to reapply for money in new year
- Space OK
- Make clear how a virtual work event will affect the agenda of in-person meeting
Virtual Work Event on COGS in spring
- Half days might work better
- April would be not bad - follow up in face to face
- COGS:
- Validation input to CSV and then CSV to output implementations
- Decisions regarding changes to input (choice etc.)
- Serializations and implementation languages JSON, OWL, XMI, XML
ACTION: Clear statement of what needs to be done and when
DOI follow-up
ACTION:
- Send out the note to all users with link to JIRA issue tracker
- What is on their wish list - what and how do they need to cite
- DOI for top level DDI Alliance site as a project
Mailman follow-up
ACTION:
- Write up a piece and then lateral to Marketing
- If we go to Google Groups (MGroups format)
- Marketing team if they want to have community discussion web site
- Or do we just want email make decision before moving platform
Quick discussion of upcoming Scientific Community meeting
- Not the annual meeting (bylaw triggered meeting of reporting and review of scientific plan)
- 2 Scientific Community meetings for community motivation
- Some topics that are being addressed in groups but need broader expansion
- URN resolution, mapping within suite, mapping with outside standards
- Generate interest and targeted activity and participation
- There are plenty of proposals in the scientific plan that could be chewed on
- Doing advertisement from TC to things we want to create or promote
Extra Action Item:
- EDDI videos will be turned around quickly as received.
- Jon will follow-up and notify us when the update on Agency Registry is ready for reference so we can make announcements
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Flavio, Darren, Oliver, Larry, Johan
AGENDA:
CV update from CESSDA
- Update on export of data from the CESSDA system so that we can work on import process
- Oliver has had conflicts the past month hopes to have something for the end of next week
- Carsten is signing on with Slovakian group to fix issues
- Darren has meeting with Carsten and group to focus on design
- Will need to review what the versioning model looks like
- Manage in CESSDA tool and export for the purpose of publishing - its a one-way path
Collectica registry enhancement
- Write documentation that can be put on the site
- When video from EDDI is available we can provide link as an introduction
DOIs for high-level documentation
- How do we organize the document (DOI for each full document or parts)
- How do we handle versioning
- Using the established DDI use of
- Purpose of the DOI is to use in citing documentation and should be to the published PDF; the sphynx online version is the development electronic document
- Main point of DOI is that it is immutable so DOI could be a root document that may include a change log
- Difference between a journal article and our documentation is enforcing what happens inside the {document
- The purpose is to provide a pointer to the document
- Need to clarify a versioning policy - we have to be careful that we are clear about what constitutes a new DOI and how content is packaged
- What do we want to put a DOI on? the thing we publish is the standard is the schema (including field level)
- The documentation needs some connection to the schema as it is separate
- What is the purpose of a DOI and is it the most appropriate? what needs to be covered? what's the best was to do this?
- Link is a citation
- API referencing guide for standards is the URL of the agency in this case ddialliance.org....so does that need a DOI
ACTION:
- We've had a number of requests so lets draft something up internally
- Start a JIRA ticket in TC and put it out to the user group sent out a note to DDI User group and add a JIRA ticket or send an email
Repository organization for standards and product generation tooling
- As the CDI document gets more finalized keep group informed
- Discuss intent of document with Achim
Timeline for DDI-CDI - estimated to be provided to TC Jan 19
Face-to-Face meeting planning
- Contingent upon travel restriction and approval processing
- Check on contingency of moving money to next year, tentatively July
- Check on space availability
- Do a virtual thing to put some time aside to work on COGS and get that done
- Transition planning - brainstorming session with moderator; leadership, organization, future proofing technical and processing aspects, shifting to management of a suite of products as opposed to a single standards
ATTENDEES: Wendy, Jon, Darren, Oliver, Dan S., Larry, Flavio, Johan, Jeremy
Mappings:
- division of work
- TC should drive product mapping
- Who should be driving mapping of different types
- How to coordinate with work taking place with other standards such as GSBPM, CoData, SDMX etc.
TASK:
- What needs to be finalized on the high level model?
- Expand on the what is on the page - how would this work?
- How do we describe relationships at different levels (conceptual to automated transformations)
- Bring a proposed framework to the SB
- Wendy, Flavio, and Jon - pull our stuff together and discuss with Ingo in liaison role in terms of meeting our goals
CV versioning of URI in CV manager
- Oliver and Darren have talked to Taina - there are discrete version numbers in CV manager
- Notification mechanism to follow our versioning processing
Clarifying points:
- Use of names in the URI for concept ID
- Break between what has been done and what is convenient
- Why is the version part of the short name rather than a separate section
- Flexibility to do what we need to do - URIs being generated for the CESSDA specific CVs versions are brought out in an entirely different way
- If we can't get it as part of the output parameter we can (worst case) adjust this using regEx magic in transformation
- Prepare a document containing final proposal on what the URIs should look like in the end. This includes the base content, wild card mechanism for version or vocabulary trim name
- ICPSR is currently running the web site - DNS a sub-domain would be desirable
- Is there a requirement to support Sparkle end point query? would support distributed queries, could easily be added in the future if requested
- Is there a use case for rest API? not currently, even weaker argument than for Sparkle
Would there be future support for Statistical Classifications? - Do we have a flat access to codes or hierarchical?
- Darren and Oliver is talking with Taina, John Shepardson and others working on the CV Manager
ADMINISTRATIVE:
No meeting 25 Nov, and 2 Dec
Next week Codebook
ATTENDEES: Wendy, Johan, Larry, Oliver, Dan S., Darren, Jon, Barry, Flavio
Resolution work update:
- In touch and prepared schedule for specific work
- Transformations up and running mid-November
- Loss of Bitbucket Services feature should be no problem - webhooks, services may have been used with Lion
HTTP based work is planned to be completed by EDDI
Mailman lists
- Mailman history is being used to search for stuff
- Is there a way to preserve the history in Google Groups through a transfer - ACTION: Wendy check on this
- Discourse is a bulletin board and email list combined https://www.discourse.org/
- Who would set Discourse up - there would be a cost for this
- https://discourse.mozilla.org/ example
- open source but cost to hosting or paying someone to host - 85% discount for educational
- Google groups is used by Developers Group with no problems
- The broader question is whether a browser blog easier to see the conversations and not need to join the group
- $15 a month isn't really an issue
- ACTION: Summarize this discussion and contact Jared (loop Dan in this discussion)
CODEBOOK work
- Simple to Conceptual
- Highlight in the public review
- Document explicitly in inline documentation
- NCube alignment - addition of attribute and cohort region
NO Meeting on 25th November
Topic for next week:
- Mappings across standards - structure the work on this, type of mappings, etc.
- Oliver and Darren are having a meeting on versioning URI in CV manager
ATTENDEES: Wendy, Jon, Dan S., Oliver, Darren, Flavio, Johan, George (first topic only), Barry
AGENDA:
- Codebook issue resolutions
- CDI update
- RDF Resolution update
Codebook-
- The SDTL set of terms and properties for recording complex survey weights that elaborate on Stas' issue
- Stas has been making comments but in general seems to agree with the content
- It's a complex framework and it would be helpful to have insight from those involved in complex survey weights from various locations
- Its the complexity of properties that need to be considered and it would be helpful to have more comments
- It would be helpful to contact people first
DECISION:
Review of content that is product independent - do this first and then put in a future version of Codebook; it would be good to look at this from the perspective of Lifecycle which is not just descriptive of past actions
https://docs.google.com/document/d/1wIBolHEKTi5_JujKBkdr2s92PVpQKp_z8tyPbTqQV1E/edit?usp=sharing
Review of use in Codebook as a separate process - beta version 2.6 and then official release
ACTION: create JIRA issue for separate review of weighting content to track who is being asked to look at this (TC-225)
CDI Update:
- Liaison with W3C (will be on next SB agenda) - One of things at Dagstuhl Pierre Antoine was starting work on CDI RDF representation but there was some issues on the model
https://www.w3.org/2001/11/StdLiaison
Approaching a form of liaison with DDI through the Scientific Board - RDF styles across the suite - how much uniformity and how much individual flexibility should be supported
- Internal DDI vocabulary - with extension structures (this appears to be the case)
Keep track of what is going on here
RDF Resolution:
- HTML output - work with Sanda and Michael
- Current content is very out of date - recommendation is to hold until resolution is complete
- Is there an option for producing static content (SKOS file, HTML, CodeList-XML) and then loading into triple store when that is ready
- We need to nail the path down...currently http://rdf-vocabulary.ddialliance.org/[product name] also https://...
- HTML page needs to be a joint process with Michael
- It would be one of the products in the bitbucket repository where Michael could retrieve it to update
- That would be ready in a few weeks
- Michael was seemed fine with working with a git repository
Added content areas from MFP work:
- Content would come after upgrade - Set up JIRA issues regarding the broad areas we want to look at and identify location of input materials - get wish list together and review wish list (filed TC-223)
- Follow-up from last week:
Filed TC-224 Documenting procedures governing management of requirements, development work, feature roadmaps, etc. to capture review of existing documents
Sent to Oliver and Darren: (questions or possible issues in the RDF resolution work from Wendy)
Right now this domain is managed by ICPSR(Michael) rdf-vocabulary.ddialliance.org
Should this be the URL associated with int.ddi? or should each product have its own subagency (ddi.lifecycle, ddi.codebook, ddi.xkos, ddi.cdi, ddi.sdtl)
Does SDTL currently use a URN or just the rdf-vocabulary.ddialliance.org?
XKOS
RDF namespace: http://rdf-vocabulary.ddialliance.org/xkos#
XKOS documentation: http://rdf-vocabulary.ddialliance.org/xkos.html
RDF Turtle file: https://rdf-vocabulary.ddialliance.org/xkos.ttl
Controlled Vocabularies (note that Controlled vocabularies has its own subagency which is actually "int.ddi.cv"
Short Name:AggregationMethod
Long Name:Aggregation Method
Version:1.0
Version Notes:
Canonical URI:urn:ddi-cv:AggregationMethod
Canonical URI of this version:urn:ddi-cv:AggregationMethod:1.0
Location URI:http://www.ddialliance.org/Specification/DDI-CV/AggreagationMethod_1.0_Genericode1.0_DDI-CVProfile1.0.xml
Alternate format location URI:http://www.ddialliance.org/Specification/DDI-CV/AggregationMethod_1.0.html
Alternate format location URI:http://www.ddialliance.org/Specification/DDI-CV/AggregationMethod_1.0_InputSheet_Excel2003.xls
Note that the current URI's are in error and do not/will not resolve
What needs to be done about this? Short term, long term
ATTENDEES: Wendy, Flavio, Jon, Dan S., Jeremy, Johan, Larry
CDI update:
- CDI will not be delivered until end of December
- Achim needs to work on the production process and making it transferable post-retirement
- Only XML representation in initial package - RDF and JSON need to follow quickly, need to make sure it is in line with the resolution system
Broader coordination of implementation formats across the DDI suite:
RDF and JSON representations:
- How coordinated do we need to be across the board in DDI products?
- We have XKOS already and we have the style of output from COGS for RDF and JSON, CDI is also planning this as a quick follow up to initial XML release
UML modeling:
- Role in DDI suite
- Output from COGS - goal is canonical XMI
Discussion:
- It would be worthwhile to discuss sustainability of CDI post initial release with CDI - both the XMI content and also the production stream
- Relook at the platform independent and platform dependent approach - The current structure has been collapsed into a single model approach for expediency.
- RDF is getting pushed into the model similarly to how XML was done in the past again partly due to the collapse into a single model
- This differentiation doesn't have to happen before publication but there are RDF issues that need to be addressed - how modeling supports the RDF
- There may be separate models to support the RDF
- CDI vocabulary - not looked at mapping and since you can mix and match in RDF mapped content could then be used - RDF is being generated from the model in a perscriptive way. Its not clear you'd like to be chasing a changing external world as part of the specification. There is a referencing mechanism that helps support flexible use of or relationship to an external vocabulary
- Requirements are similar across the suite of products but we need to look at them as a suite in terms of approach
- What is the work product and how does that relate to being a standard?
- Lifecycle where the canonical product is currently XML and are moving towards a model with multiple implementations. So what is the standard? The model? the implementations?
- Use specific implementations of CDI and the XMI is the standard and the implementations are the recommendation of XML, RDF, JSON...
- Lifecycle is moving toward where COGS csv files expression of the model with XMI, XML, RDF, JSON, etc. expressions (serialization agnostic model)
- XMI is in the same position as the XML as there are things that are done because its UML. XMI is a UML representation that is intended to be portable (same conceptual scheme so you can return to the model loselessly)
- There could be an exact mapping between csv and xmi to capture the needed xmi content that are not currently in COGS.
- There is a document on the use of UML and the XMI output in CDI that is close to finalization
Management of requirements and integrated framework:
https://docs.google.com/document/d/1TEy2zdxfARDgkOABb6kzuAmHK2oBJ0f5CmxL9F9pEbQ/edit?usp=sharing
Issue presented:
- Better management of requirements and integrated framework
- How to bridge devide between the products and make decisions about them
- How do we look at the requirements and make roadmaps
- This needs to be formalized as the complexity of what we are trying to manage increases
- There is change management required all over the place
- How we capture requirements in the past has varied and can get lost over time. Thre is no good systematic was of recording and maintaining what was done and why.
Discussion:
- We have a process for lifecycle and codebook of what changed and why and we need that across the board
- For example this got captured in Codebook 2.5 to 2.6
- Similar in Lifecycle
- Working groups are a functional aspect of that in terms of implementation
- Concerns with the XMI is that its not very straight forward to doing that - tracking changes
Regarding versioning of UML: https://sparxsystems.com/enterprise_architect_user_guide/14.0/model_repository/versioncontrol.html - For XKOS there is tracking with GitHub
- For SDTL there is tracking with GitHub
- Integrated approach - a bit more than product specific approach
- The shift of CDI from development mode to produciton mode may be more useful
- Some of the discussions about Lifecycle have been when do we bring in development from work done in the Moving Forward Project. We have this on our agenda and need to get this moving.
- There is a need to address a strategy in terms of requirements in terms of product creep
- Entities that look too much inward end up disappearing - a need to look both at the current users and the growing environment and what needs may arise from that external group
- Standards meeting need of particular domain and others that support across different standards as opposed to the One Ring to Rule them All
- We need to capture and management change information between versions of XMI particularly if this is where development is taking place
ACTION:- There are existing process document for the TC and updating that and pull this into that document
- Terms of reference with the Scietific Board
- Look at how that document can be updated to reflect these issues and update processes
- Policy and decisions are done during recorded chat etc. Look tools that can help coordinate this material.
- There has actually been a lot of work done on this over the years and it needs to be organized so it is clearly available, updated, and reviewed for gaps as well as new needs given change to a suite of products and changes in the environment and user groups.
ATTENDEES: Wendy, Dan S., Flavio, Johan, George, Jon, Darren
Question regarding handling default values by validators
- Should be done in the tooling
- the tool (editor/validator) should simply provide the default value
DDICODE-52 and related weighting and sampling issues
- parameters that used with weighting when you reuse the data
- written from the point of view of the user - detailed guidance
- Stas idea of DDI providing guidance
- Creation of data set - expectations
- This would be inside a DDI Codebook or Lifecycle and plan is already there
- SDTL would provide a way to capture all of these elements and pass them into
- In Lifecycle perspective in principle if you are doing the strata you'd already have it in Lifecycle and can use it to populate this information
- there is not yet a smooth workflow where you specify --> implementation --> alterations --> result
- SDTL should be able to capture the analysis
- Language is the same between the how did I do it to how should I do it
- Should this info on samplingDesign belong in both places
- Different shots of views from planning, implementation, effects
- Long term covering original plan, what it turned out to be.
- Short term for codebook -
- need a good descriptive of the sampling process
- guidance on how to analyze the data in that data set -
- Stas' original point is important - make it easier to use complex weights (cluster, replicate, etc.)
with examples- provide sufficient content and providing examples
- Documentation is critical - on the side of producers they all know this stuff - the statistics is well understood
- Boils down to equations and math
ACTION:
Jon - will find people familiar with panel studies, clustered stratified to review this content
ATTENDEES: Wendy, Jeremy, Dan S., Barry, Jon
Technical Work Updates:
- LOD setup update
- Agency Registry work before EDDI
Codebook:
- Stas's example on how he was working on and can't be descriptive of the standard
- George's is much more detailed
- Find people who have worked with weighting and run it by some other people and see if its works
- Write up and find specific people to review
Resolution Document
- Streamline, limit to what the DDI Agency Registry does and coming updates
- Limit agency option to statement that they can provide a range of services from static page to full-negitated resolution system
- As agencies begin providing access at varying levels provide links to the options (similar to xml examples, tools, implementers, etc.)
ATTENDEES: Wendy, Jon, Dan S., Oliver, Darren, Larry, Flavio
LOD document:
- Clarify URL for CV and vocabularies
- Michael could set up any redirects required
- int.ddi.cv agency name
ACTIONS:
- write up note on costs based on comments
- Offer Darren and Oliver/GESIS service for set up and use next year
- Send to Jared - PDF of above document sent 2021-09-17
- update TC Statement on Resolution for review by TC members
Agency Registry Review Criteria:
- Reviewed document and discussed the intent of DDI in terms of registration - Broadly accepting
- Current system for addressing Robot filing
- Approval process basics
- Terms of use option
ACTIONS:
- Write up current basic verification and approval process
- Summarize Terms of Use issues and forward to Scientific Board for clarification and encoding of Terms of Use for Agent Registry
ATTENDEES: Wendy, Jon, Darren, Oliver, Dan S., Larry, George
- The CESSDA CV manager is being sub-contracted out and so change in their system is going to be slower for the near future
- The output from the CV manager can be put into a richer environment
- Bring XKOS into LOD environment
- The path described will get us there
- We need to not focus on what CESSDA will do, depend on it as a management platform not a dissemination platform
- We can massage the output from the CESSDA tool to correct URIs and create the CodeList and HTML page for DDI
- Add some specific LOD reference examples to the resolution document
- Additional specific comments were made on the documents
ACTTION:
- Complete costs section of LOD-Infrastructure document
- Add suggested areas to Resolution document and respond to all comments
ATTENDEES: Wendy, Dan S., Barry, Oliver, Larry, George, Jon
Discussion of document specifying work needed to implement a DDI resolution system for CVs and RDF vocabularies
https://docs.google.com/document/d/1rPcg44jV2xmqGTxTMKZsFgFgUhxtGeSpUt9DiE7pX7k/edit?usp=sharing
- CV repository will be created in next 2-3 weeks. From CESSDA repository to Bitbucket.
Bitbucket provides pipeline options which Oliver can use to set this up on a regular basis (monthly to start with) - Does the script run in the cloud? This may need to be decided by the implementer. This may be a simple docker compose running somewhere.
- The updated docker images would be created each time
- The imported data would be the state of the actual running images of those instances. They are recreated with fresh applications each month or so. We don't need to worry about recovery of those container instances because we will always be able to recreate them.
CV maintenance:
- CVs are created and maintained in CESSDA. Exportable into SKOS
- The CVs are pulled from CESSDA and translated to provide the content in DDI structures. A Bitbucket repository will contains all of the DDI content (SKOS, CodeList, HTML)
- This repository would be updated monthly (more or less) using a Bitbucket pipeline which takes the CESSDA SKOS and creates the formats needed by DDI
RDF vocabularies:
- XKOS is currently maintained on GitHub and just requires a pull
- URI resolution will need the new stack. But the actual writing of the files is taken on by CESSDA as part of the SKOS.
- Similar to the ELST thesaurus at least to the publicly visible site
- pubby and Apache Jena Fuseki is stored as triples in a graph database and can handle different vocabularies like JSON
- https://dbpedia.org/page/The_Guardian
General questions/comments:
If the docker container get moved to a different host?
This is just the infrastructure. Within the restart we need to ensure that a fixed address of that infrastructure would be kept. We would need to bring this into the namespace of the DDI Alliance
A triple store with sparkle end point and puddy will provide a wide range of formats.
ACTION: Darren needs to go over it, add on cost items, if no substancial changes we can then finalize and contact Jared
ATTENDEES: Wendy, Oliver, Dan S., Flavio, Darren
Resolution work:
Darren and Oliver will meet tomorrow
Oliver has been digging into the details of Darren's suggestions as well as how GESIS manages data content
Barry and Wendy will do a zoom edit
Goal is to have these prepared by mid September
Upcoming:
- CDI model will be done in a few weeks but documentation still needs to be completed and packages
- Codebook issues regarding
Work tasks starting in October/November:
This is a list of development areas we have identified where work has been done during the Moving Forward Project that should be reviewed for incorporation into Lifecycle. The approach will be to idenitfy members interested in specific areas and then soliciting a broader group to work on the topic outside of the regular TC meeting (similar to Codebook approach). The items noted as priorities are those that should be addressed prior to the spring in-person meeting as they will have some impact on how Lifecycle 3.3 is serialized into 3.4.
- After CDI review is done look at Lifecycle data description again in terms of logical data sets and how files were made in the 70's and how that could be simplified along with alignment with CDI (PRIORITY)
- Data cataloging at different levels
- Geographic specification
- Questionnaire improvements aligning response domains with the variables they create, output parameters (select many - allowing a variable to reference a specific response) clearer mapping of data through the capture and process system
- NCubes and NCube definitions and we now have different levels of variables. Compatibility with SDMX 3
- Couple different types of physical records, substitution groups, clarification and simplifications. Aggregates with definition. (PRIORITY)
- Process models
Things to think about for 3.4 work:
- Alternate namespaces in substitution group
- Complex nested choice sections
- Use of abstract classes in 3.3 helped a lot in preparing 3.4
- Upper model work needs to continue - how products mix together (PRIORITY)
ATTENDEES: Wendy, Jon, Oliver, Larry, Barry, Dan S., Darren, Flavio
Resolution System:
- Reviewed changes to the DDI Resolution document. Notes were made on document
- Darren and Oliver will meet in the next week few weeks in terms of recording how the URN and URI content will be resolved and how different versions will be represented (this is particularly important in terms of CV content).
- In regards to serving XKOS that is relatively straight forward
- SKOSmos usage
- Underlying content (metadata schema)
- It ultimately comes back to the hosting question
- Do we have anyone specific in mind to to host this (someone's existing cloud or setting up a new cloud account that someone will manage
EDDI update:
EDDI deadline for submissions a couple of weeks away
There are plans for a training event done before or after EDDI
SDTL would be interesting
It would be useful to have something on URN and http resolution - general route, what we're doing and why we're doing it - Dan could submit on the registry base work being done by Colectica
CDI vote:
CDI will probably be delivered to TC in early September. While trying to meet the end of August date, early September is probably more realistic. They need it finalized prior to the Dagstuhl workshop. The vote period is only 2 weeks so delaying in order to provide the webinar for voters should not cause problematic delay. It should also still allow time to put out Codebook 2.6 out for public review prior to EDDI without overlapping the CDI vote period.
Thinking ahead:
How do we want to organize other work tasks noted in the workplan? Continued work on comparison and mapping suite products, DDI Alliance pages, and Codebook future structures. Also for longer term goals, for example, how to identify and set up working groups on development work from Moving Forward that needs to be integrated into Lifecyle and possibly Codebook in the future (Questionnaire, data description, geographic description, separation of logical and physical clarification and simplify, descriptive content for codebook).
ATTENDEES: Wendy, Dan S., Larry, Flavio, Barry
CDI:
What changes have been made since pubic review?
- Minor changes depending on what is being introduced
- Might be some larger changes - which may need review by TC
- Not new features so not a need for public review
- TC review - would be helpful
- Module separation for example, linking approaches
- Units of measurement is new but essentially a refinement of a controlled vocabulary
- Controlled Vocabulary clarification of reference
- TC should review for comparative conflicts, alignment with other products
- TC needs to be aware of consistency in terms of identification and referencing, controlled vocabularies and other cross cutting ideas and approaches
Are there additional features of UML being used?
- Canonical XMI restrains the amount of EA UML that can be used
- We need the document that describes which elements can be used
Webinar for DDI voting members should focus on:
- How it fits into the DDI Suite - role
- How does it do its roll of integration - what is supported in terms of content
- How does CDI fit into the suite of products - addition not a replacement of lifecycle and codebook
- Practical example of what it is intended to do
- Do we have a use case of putting CDI together with codebook or lifecycle - focus on the integration piece
- The webinar could turn into a promotional (what it can do for you)
Approach of Codebook alignment with Lifecycle:
Codebook is descriptive in nature while Lifecycle takes that descriptive content and adds machine-actionable content to better support searching, management, access, and metadata driven processing. Therefore Codebooks descriptive content should move cleanly into Lifecycle and any actionable content (controlled vocabularies, specific content, etc.) should also be easily transferable.
- Good elevator speech on what the relationship between Codebook and Lifecycle
- In codebook 2.5 we stared adding identification for Lifecycle - identification of Lifecycle equivalents - content equivalence is important - making sure that base type is
identifiable if it's identifiable in lifecycle - This last point is important and will be noted in both the alignment issues and the documentation issues in Codebook to more fully inform users about relationships to Lifecycle on the element/attribute level and in the high level documentation in terms of general use of the specification
FUTURE MEETINGS: Resolution documents will be scheduled for August 19 when Oliver is available
No meeting
ATTENDEES: Wendy, Dan S., Jeremy, Oliver, Darren, Larry, Barry
Guest: Jared
DDI Alliance agency level resolution:
CVs
RDF and other expressions of specifications
Options:
- ICPSR - no, DDI does not have access to server or staff to support this work
- Cloud space with maintenance staffing from member organization - The Alliance currently does not pay for any cloud space
- Hosting and maintenance staffing from member organization
- External hosting/maintenance service
DISCUSSION NOTES:
Needs more specification before we can go out- SKOS mosh can do RDF resolution and HTML resolution
Runs on Linux virtual machine - Might work for controlled vocabularies but won't support specifications (XKOS vocabulary)
- What is described in the Darren/Oliver document is technology stack that would be needed for that type of resolution using 2 Apache tools
It should be enough as it is stored and developed somewhere else
System would be straight forward and served through 1 0or 2 docker images
Should be enough to throw it onto a cluster and then destroy on a monthly basis (monthly update of system software); pull new docker images for the apache tools
Update system would be automated by monthly pull and replace for system
Maintenance of content would be separate from system maintenance Kubernetes
- https://kubernetes.io/
https://www.digitalocean.com/pricing#kubernetes - With the pricing they show there $10/month for cost of maintenance
- Set up is the main effort
- Unlike cloud setup used during Moving Forward where we had to do all the updates which caused some vulnerabilities we had to deal with
- We could get rid of all system updates but using docker images and refreshing system data once a month
- Find someone to initiate and implement that set up
- This still means setting this up
- Is CESSDA a possibility
- Is CV manager dockerized? We think so
- the only interface we would need to the CV right now would be straight forward
- $20 gets you way more than we would need
- droplet is something like a server (virtual service)
- 1 MG CV content, few more Kb for XKOS
- This could scale up by buying larger virtual machine
- Digital Ocean was the one that first dropped out when he searched
- $10-20/month would be an ongoing cost ($12,000-$24,000)
- Set up costs depend on the amount of Kubernetes the person had
If CESSDA helps with set up we know they have a lot of Kubernetes experience
Probably worth paying for that experience and knowledge - Definitely seem to be talking under $10,000 USD
Define image needed, we want to get this set up on a given platform, want someone to set this up, with refresh once a month
1-2 days for a Kubernetes expert $1000 to $1500 per day would be about $3,000 so definitely under $10,000 - Could be set up with a bit bucket repository
Is this refreshment automatically set up in Kubernetes? - John Shepardson and Matthew Morris
- What are we hosting this on - software stack
Apache stack was set for CVs Apache Fuseki is a triple store and won't do full content negotiation - Tools for serving up RDF
- Need can see if Cosmos can deliver for
DECISION
- Pursue the Cloud with the monthly replacement software approach noted in discussion. Identify DDI members who can provide set-up expert ice.
- Write up specification with ultimate deliverables and the time frame
- Maintenance side is something for the Executive Board for on-going costs (cloud space and maintenance cost)
- Provide a nice succinct proposal - Include on-going cost of space and maintenance costs - planned maintenance costs
- If we exceed $2000 amount we would need to go back to the board for approval
- Provide background paper on options for resolution levels currently in the works
- From Jared's perspective the sooner the better to get this through bureaucracy
- We can unofficially seek support from interested people
ACTTION:
Oliver is unavailable until 15 August
Darren will provide draft of the proposal with work specifications with deliverables and time-frame
Wendy will work on finishing the DNS Resolution document for background providing clear levels of activities that currently need to be managed by the agency (in this case int.ddi)
ATTENDEES: Wendy, Dan S., Larry, Oliver, Darren, Flavio
AGENDA:
- XKOS and Paradata groups: topics on which TC support/contact will be needed this year
- XKOS publication process for Best Practices - opportunity to look at any official publication process or a uniform look/feel for Best Practice documents. How to reference an XKOS classification. Look at common features and content.
- Paradata: Opportunity to work with them in terms of preparing new content area where it may be added to more than one product where we want consistency between the models in terms of making sure content is easy to share or transfer between products
- Discussion:
- Formal definition expressed as an ontology or model would be good to work on
Darren would be interested in working on this
CDI has talked about attributes down to the data and how this could be supported by an ontology - Model before ontology would be useful
- Clarity between a measure and a variable and the instance of a measurement, use of conceptual value (instance of a measurement, expressed as an instance value). Paradata has information on the instance of a measurement. Tie into variable cascade.
- Formal definition expressed as an ontology or model would be good to work on
DNS Resolution:
Draft document covering preparation of document on current status of agency resolution, DDI Alliance resolution service for sub agencies and ddi published content, options for resolution by agencies currently supported or in progress
https://docs.google.com/document/d/143tERbtM8Eze-z28jCZMxTm8bHaGfKh6/edit
Comments have been added to the page including additional areas that need to be covered and clarifications. We will work on this over the next month. Target audience is the Scientific Board but content should be usable for informing ddi agencies and end users.
Preparation for meeting with Jared on 22nd
ATTENDEES: Wendy, Dan S., Larry, Oliver, Barry, Flavio
Follow up on last week's action item
- Talked to Barry and Oliver about generating announcements from change log information. This can then go out through Jared. We need to bring him in as that gets further along.
- Oliver is working whether there is an API available to getting change log information. He will be working on the scripting in the coming weeks.
- We can look at and play timing and publication options for Michael. Work with Michael for the best process. With the bitbucket repository it is easy to identify change.
Discussion of next week's Scientific Board agenda
XKOS and Paradata
- Publication targets of Paradata? Lifecycle, Codebook, CDI attributes at various levels etc. Paradata. Do they plan on an RDF representation? Definition of Paradata.
- XKOS time line for the year to 18 months
DDI URN resolution - Status update - goals and policies to be formed
- Is it possible that people want to register but don't have the capability to provide resolution.
- The http resolution of the agency id is the ability for different organizations to create a templated URL for different user services (direct link to DDI, a viewer page) There will be the ability to use the token of the DDI URN to create a URI that can be accessed. This gives agencies a lot of flexibility to do what they need to do.
- No agency is required to have an agency based resolution system. For the web browser resolution you just put in your web site.
- Anyone can register an agency. There is no requirement for resolution. But if they want it to resolve in the future they can update their http uri's to new content resolution.
- The HTTP resolution is scheduled for sometime in September.
UPCOMING MEETING:
Reminder that Jared will attend the 22 July meeting discussing computing and management support for resolution of DDI RDF Vocabuaries and DDI Controlled Vocabularies.
ATTENDEES: Wendy, Oliver, Darren, Larry, Dan S., Flavio, Jeremy, Michael, Sanda, Taina, Barry
AGENDA: Clarification of steps needed to finalize the publication of DDI Controlled Vocabularies on the DDI web site at the level of access currently available. (NOTE: this does not include content resolution of items in vocabularies which is being addressed separately)
STEP 1: getting SKOS output correct from CESSDA
STEP 2: transformation to correct SKOS output, create the CodeList output and HTML
STEP 3: delivery of updated content to Michael for posting on DDI site
Step 1 Notes:
There are identified issues with the CESSDA SKOS content which have been noted by Achim and passed to the TC. Franck Cotton reviewed the SKOS example and validation report, identifying specific problems that required fixing, best practices that should be followed, and other related comments. This information has been passed to Darren and Oliver. While these issues involve the CESSDA system, TC will address the issues in the transformation of the CESSDA output into the DDI published content. Darren will provide this feedback to CESSDA. TC will alter its transformation scripts over time as changes are made in the CESSDA system.
Step 2 Notes:
Tool: is a command line tool that pulls content (SKOS multilingual file from CESSDA); transform to SKOS, CodeList, HTML; prepares package of content to be uploaded to the Alliance site
Options for timing:
- Full fledge update of every vocabulary at a regular basis which would reduce the manual work; only DDI; this would track new language editions; Periodicity can be determined by CV committee (monthly basis)
- Publication basis: Run the command line tool and pass on the file
DECISION: Tool will be run on a monthly basis. This has been agreed upon by Sanda and Taina. The CV group should review during the year to determine if the timing is correct. Periodicity can be adjusted in the future based on this review.
SKOS content:
- Issues around publishing - how to notify users of new publications; there is not metadata providing the equivilent of change files
- The way CESSDA is handling versioning and changes in the metadata is wrong and needs a ground up correction (personal view). This is a long term issue.
- We could address the versioning regarding language information at the DDI level, effectively consolidating and creating a change log.
- In order to fix the versioning and lifecycle management how do we provide uniform versioning system.
- How to incorporate change logs into SKOS and other output
- Example, if you have a URI of a Norwegian label you know the source concept but you don't have information on comparability between version numbers.
- What we actually need is new version of relationships between versions and concepts. isNewVersionOf, isPreviousVersionOf, isSameAs at item level
ACTION: Oliver needs to determine if we have enough information to make these additions, do we need to retrieve separately
ACTION: Wendy obtain and verify versioning of CVs in DDI as stated by group and clarify what will be expressed in DDI
Step 3 Notes:
ACTION: Need to agree on the actual URI pattern we want to use. They are currently wrong and this needs to get this corrected in the CESSDA tool
All files are in one folder and all file names contain the version number in the file name. Need to see what it currently is and suggestion for ddialliance/specificaiton/ codeoff the cv.version. and file format
Considerations --
- This should support having content resolution for SKOS supported somewhere at a different location.
- URI and URN non-machine actionable and machine actionable one.
QUESTION: could DDI site run automatically. What is currently being implemented in JAVA. TC needs to find a location to run that pull.
DESCISION: Run the JAVA within bitbucket. When we are displaying multilingual. Use a bitbucket repository for that retaining all copies. Michael could just run a clone of the repository.
ACTION: Michael and Oliver need to coordinate so that what comes out of transformation is what Michael needs for updating the site
ACTION: Wendy and Barry need to discuss options for informing public of new CV publications and updates. CV group should then work with Barry and TC to ensure that updates are appropriately publicized.
ATTENDEES: Wendy, Jon, Darren, Larry, Oliver, Dan S., Barry
excused: Flavio
AGENDA:
Content resolution work for XKOS and CVs - status and scheduling of meeting with Jared in July
FROM Jared - How about I attend a TC meeting in July to discuss next steps (I should be available the weeks of 12 and 19 July)? We'll want to identify any DDI member organizations that could provide the services you are seeking. Additionally, we should consider preparing for a competitive bid with outside vendors.
- Oliver tried to do the easy approach on a spare server and ran into a few issues
- Somewhere there should be a service provider that should be able to provide what Darren is talking about where we would just maintain the content rather than the service stack. If we could get.
- Need to be sure we can port off RDf resolution to a different server for content negotiation
- Issue of different styles of content - hosted under the same domain name
- Had a look at the current state of XKOS, we only have HTML and full file access. Need to support current link including the RDF-Vocabularies
- We know software stack we want to use
- Ability to name paths
- Don't want to go off with just a short term solution but want to think ahead to something stable
- Member willing to support long term (5 year commitment with proviso that we don't want things scattered - need software stack maintenance as well as content maintenance)
- If we go with a general cloud we need to maintain stack, if a member then they should provide stack maintenance
- We can't mix CV and DDI RDF content resolution up with the broader discussion of a resolution system for DDI instance content (common content) because that has a different volume and different content maintenance needs
- For the July meeting:
- Get clear on what is needed, who could provide it, and if a call for bid for support
- Need to clarify if ICPSR is unwilling or unable to support
- We need to do resolution and presentation of DDI CV as well as RDF Vocabularies
- We need maintenance of management stack
- Maintenance of content
- Versioning of CV content by language is something that we need to deal with as this is separate from this discussion and something we deal with in the publication process
- Be aware of problems becoming captive to them or loss of service and content due to closure
- Clarify with dealing with Michael that it is just the publication of html, options for grabbing full content, but NOT resolutions
- Material from CESSDA output to ICPSR for visual page publication
- SKOS to DDI CodeList
- Tool to download and transformation has been placed in tool
- Periodicity of update (routine updates for anything new - push or "publication driven - pull")
- Oliver will send out an example of what the tool currently is able to provide and then talk to Michael to work out the details from that
- July 15 or 22 - Jon won't be around on 15th
ACTION ITEM: Work with Michael to ensure that content is transformed into page content
ACTION ITEMS:
- Get materials together for Jared prior to meeting
- Ask Michael to July 3 or 10th meeting to discuss publication of CV (standard content - not resolution)
Work on additions to the product pages for Codebook (past development, change logs for published versions, etc.)
Looks good continue with this
Issue with URL resolution resolved
network blip - why schemas should be stored locally for DDI instances
informational you want the canonical but functional you want local
Best Practice issues to addressed
ACTION ITEM: (all completed - wlt)
remove page of topics
- Add label BestPractices to JIRA
- Create public filter on BestPractices label (inform Hilde and Arofan of new label)
- change link to a link to JIRA filter
ATTENDEES: Wendy, Larry, Barry, Dan S., Oliver, Flavio
Some notes regarding specific activities:
CV publication:
Resolution of URI is still a problem with SKOS. Would it be possible to change the URL structure to these to be more in line with CESSDA repositories; adopting the URI part.
Oliver will right up within the week and send it out. Find out from Achim what the validation problem was that he was seeing.
For the CVs is a very simple approach. Implementation is about a day if ICPSR is supporting. High Priority get this out in the next few months
HTTP agency resolution service: Should occur Aug/Sept; grid updated
Product mapping
Identified a few mappings: Variable Cascade; Statistical Classification; Process Model; Data Description; presentation is a primary issue and information may need multiple levels of presentation for different purposes from interpretability to actionability...notes added to grid
ACTION
ADD earlier versions to codebook find and add
NOTE: www.icpsr.umich.edu/DDI resolves to ddialliance.org
https://ddialliance.org/Version2-0.xsd
http://www.icpsr.umich.edu/DDI/Version1-2-2.xsd
https://ddialliance.org/Version2-1.xsd
Have emailed Michael for a listing of all URLs for Codebook schema and dtd instances.
FOLLOW-UP ON VALIDATING URLS:
I mis-moused copying the URL into a browser while we were meeting. http://www.icpsr.umich.edu/DDI/Version1-2-2.xsd does actually bring up the proper xsd.
When I later tried the xml file in oxygen it did validate.
I was trying to make two points
1) having a working URL in the schema location is helpful for files distributed by an archive. That makes it completely clear which schema is correct.
In the example below, the namespace URI (http://www.icpsr.umich.edu/DDI) is not specific to DDI Codebook, being the DDI home page. The version 1.2.2 may no longer be adequate now that version number is no longer tied to product line (Codebook, Lifecycle, etc.). A better practice might be to use https://ddialliance.org/Specification/DDI-Codebook/ for Codebook.
ATTENDEES: Wendy, Darren, Dan S., Oliver, Larry, Barry, Flavio
AGENDA:
CDI review package - update
- Expected delivery to TC end of July
- Contents look like they should cover all requirements including request for the profile of UML usage with which to initiate discussion of long term maintenance of CDI
- Initial decision will need to be whether a final review in addition to the short review included in the vote
Canonical URLs for DDI XSD files - Darren
- Discussion with CESSDA regarding rules for DDI profiles on the use of a canonical format for references to DDI schemaLocations. Problems with some systems have arisen with the alternate use of http and https references as well as the presence or absence of www
- Is there a canonical URL for any if referencing schema location?
- Best practice to have schema files locally; remote schema location can have resolution problems, server down
- Tool should load schema separately from validating instance
- CESSDA wants to guarantee they are referencing the same schema files
- Sounds like it may be better to make sure tools handle variants
- Always have official DDI namespace in the instances which is set in xsi:namespace
xsi:schemaLocation - There are currently various levels of capability in the CESSDA environment
- As for http and https aren't some browsers not opening files with http - definitely an issue in some cases
- Is it be possible that the DDI does not do a redirect but delivering as http in those cases? How is this set up at DDI? Check this out. [testing shows both http and https resolve to https]
Example that bounces
<codeBook version="1.2.2" ID="ESS1e06.6" xml-lang="en" xmlns="http://www.icpsr.umich.edu/DDI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.icpsr.umich.edu/DDI http://www.icpsr.umich.edu/DDI/Version1-2-2.xsd">
Note that later check using http://www.icpsr.umich.edu/DDI/Version1-2-2.xsd resolved to https://www.icpsr.umich.edu/DDI/Version1-2-2.xsd on DDI site
Tried this header using XMLSpy and it validated.
ACTION: Is there a need for an Apache configuration change? Can Michael send the host definition for the DDI Alliance site. Set up meeting with Michael, Oliver, Wendy, Darren, plus whoever wants.
ACTION REQUEST: Good to have a record of decisions and approach to provide guidance to community - Darren will supply when work is done
SDTL - sample description content
- This is about data sets being used with the appropriate weighting system or systems.
- Stas's proposal had 2 parts. Creating items in DDI that would describe what the weights should be and 2) include the code to be used in Stata or other form that should be used to accurately weight.
- Providing code for guidance is an excellent idea. It is potentially possible to create DDI that could be read and processed by the program to generated.
- However it was very Stata oriented in terms of standard weighting. Had not really broached the subject of complex sample weights (stages, strata, etc.).
- There is a group proposed to extend SDTL to describe generated data through regression etc. using complex weights. Working with experts in this area. Looked at Stata commands for complex weights and the package for R use of complex weights. Made a list of needed content. Came up with a model that could be used by DDI and SDTL that would cover what was needed. Document provides examples of how these would be expressed.
- Stata can actually write these JSON commands.
- If we are interested in putting a full scale description of complex survey weights this would be a model. Still needs some validation. Only way to be sure is to run them with some data for validate.
- Is it package centric? Is there a need to create new content for new or other packages? Statisticians agree with what these things are so a package can be mapped to this content, for example SPSS writes its own XML document which be compatible.
- What is the role of Codebook in terms of guidance.
(Should be reviewed for Lifecycle - with this new content) - Flavio is following up on this - One approach is to identify all the different parameters that were used and describe them
The other is to provide the code to be used for analysis of data for different packages.
Surveys provide a description - to what extent do you need actionability or interpretability to clarify what is needed
Codebook: what is meant by "compatible content" within the definition of what Codebook is intended to relate
- KEY CONCEPT: Descriptive -- Interpretable -- Actionable
- Level of richness requires use of controlled vocabularies
- Target for Codebook should be Interpretability
- There is vocabulary to the parameter names as well as definitions inside the weights
- Even in cross discipline we use the same terms differently. It would be great to have this level of interpretability thorugh controlled vocabulary.
- Another use case that could be possible is to provide tools to help people with these functions if there is interpretable content with that level of preciseness of vocabulary. Having the definitions addresses the problem of obsolete code.
ATTENDEES: Wendy, Dan S., Jeremy, Flavio, Larry, Oliver, Darren, Barry
1) Budget requests finalized - cost estimate
- Only 2 specific items for budget request are TC meeting and Agency Registration update
- Addition of content negotiation service for RDF vocabularies and CV contents will be noted as generating a reqirement for funding but is dependent on where it is hosted and is not something the TC can specify at this time
- Review of document provided on proposed improvements to Agency Registration (this will result in a budget request)
- Agency registration: #3 there is an RFC for service discovery. Just an alternative way to find JASON description of agency services locations (file in a special location)
- If someone has a link to your website and wants to know what agencies have registered (map domain name to agency and publish it on your own web site)
- Agency registration is needed for creating an ID space in DDI
- Should promote when updated
2) Update on XKOS activity
Best Practices and push on usage
- in IUSSP - CODATA working group on the "Ten Simple Rules" for vocabularies
- Digitization of UN standards: launching a Proof of Concept on using XKOS to represent the classifications published by the UNSD (ISIC and CPC mainly)
3) Slide deck for members presentation
reviewed, corrected, ready to send to Jared
4) codebook content review (DDICODE 62 41 72 76 74 51)
Question arising on use in catStat as well as catgry - review discussion, logic, usecase, and decision
change locations: codebook-3 entered 2021-05-17
41
<xs:attribute name="access" type="xs:IDREFS" use="optional"/>
<xhtml:p>The attribute "access" records the ID values of all elements in the Data Access and Metadata Access section that describe access conditions for this variable. </xhtml:p>
8375 added "and Metadata Access" to documentation of "var"
1371, 1397 catgry
8173, 8196 valRng
4582, 4594 invalRng
1256, 1286 catstat
7644, 7658 sumStat
6945, 6955 stdCatgry
3004, 3016 dataItem
check issue for comment that the code for category should cover the catStat. ####
ATTENDEES: Wendy, Flavio, Jeremy, Dan S., Larry, Darren, Jon
AGENDA
Funding requests due into Jared May 24
--revised meeting funding request
--How the URI are received and ICPSR support for resolution
--Agency resolution work
--Should be part of the common infrastructure - can we put the costs that we are aware of i.e. what may need funding and a possible indicative cost (8 lines of config file have been provided)
Dan will write up description and send it out to everyone so we can look at it
Presentation to Members meeting - base on SB presentation for next year and add past year work including:
--3.3 published 4/15/2020 so actually end of previous year - remove
--2.6 work
--CDI publication work is proceeding
--Pages - products and subset of Learn (revised Product section - standardized information set for each product, Current product overview and developing products overview); Learn/Resources: Markup Examples reviewed and placed in separate database linked from each product version page as well as general searchable list; Tools - began review of contents, underlying database, and filtering issues; Glossary - discussions regarding content and upkeep; Relationship to other standards - reviewed content and where possible links should go; DDI Profiles - reviewed considering approach for this work; Resources page: content and links to sub pages as well as other pages in different locations
--SDTL published 2020-12-01
--DDI coverage and conceptual model initial work
----defining each product within the suite in terms of coverage area and application support
----conceptual model of DDI coverage area (draft - model and initial work on mapping product pieces to model)
--Initial review of the COGS work done and are revising the ingest text
Notes on clarifications for work plan and any document to groups outside of TC
--enumerating more specifically
--group under thematic area (clear disaggregation)
--anything that provides more clarity of why something is being done and what it try to achieve will be helpful
ATTENDEES: Wendy, Jon, Oliver, Dan S., Barry
AGENDA:
Funding request: Meeting? Topic?
--Process: Once Jared has requests the SB will be reviewing those related to scientific workplan/agenda; providing some comments on priorities in terms of the strategic plan
--TC face-to-face meeting - first half of 2022
could align with NADDI as we normally hold in Minneapolis
Oliver, Darren, Jon, Johan - overseas
Flavio, Larry - local
We need to meet - note items in the roadmap (2.6 and 3.4) 2.6 should be out by then
Describe an area of work - focus to be determined
We need to have a limited set of objectives so that we can get a focused set of work done rather than being too broad
--Cost of hosting a resolution service for RDF vocabularies and CV content
--Cost of hosting COGS system (probably a next year piece)
--HTTP base resolution - improvement to the software (a week or 2 of work) - Dan will write up for next week
Response from DDI-CDI
--sounds like a reasonable approach
--we will expect to have information on current processes and the profile of UML objects/features used so that we can begin discussing this with MRT once they have the package ready for the voting and publication process. Note that with Flavio, Larry, and Oliver on both committees we can obtain some of this information informally through them.
Resource Pages
--Discussed general approach
--Noted that an important process will be sorting out current content to the appropriate location (tools and examples are currently a mixed bag of content). This includes organization of underlying databases (number, coverage, content) to best support content and access
ACTION: (Wendy) As more work is done on resources pages I will create a page on the wiki to present and discuss approaches
UML modeling in CDI
--Oliver wrote the CDI to XML tooling - repository link - Achim took over work so not sure if what is published is up to date
--Separate UML models independent and dependent (ex. XML or RDF specific) structures - Flavio would be the person
--Modeling only done in independent and transformed to dependent
NEXT WEEK:
Funding requests - have estimate and draft statements ready for discussion
ATTENDEES: Wendy, Jon, Oliver, Darren, George, Dan, Flavio, Flavio
Preliminary discussion of DDICODE-52 please read the proposed solution
Note that this is focused on the use of weights in analysis
--stata has the sloppiest syntax of the set
--SAS has a lot more content on weights and move moving parts
--SPSS look at the CSPLAN content which is more detailed but is convoluted
--Does it need to be referenced/described by variable statistics? can statistics point to these? do these complex approaches apply to summary statistics? If the variable is tagged with the weight that it should use it is assumed when there is a single set of weights.
--Does there need to be something to clearly differentiate weighting approach
--There is work with MIDAS to determine how to present which weight was used
George getting in touch with Stas regarding SDTL work
Use of Slack in DDI
- would it be more appropriate to have in JIRA - we can integrate Slack into JIRA
- If someone posts a question on Slack someone can cross post
- Jon is on and will keep an eye on Question section to respond or file a JIRA issue when needed
- Slack can be used in a browser
- You can attach Slack channel to a JIRA tracker
Tools page update - ideas for content and maintenance
- EDDI, NADDI, and IASISST requests to file their tools
- there are things that are long dead - add an updated field that indicates that the person was contacted etc. Jon is willing to do this once a year. So update...3.0 cannot be implemented
- No longer available status ...what is the point of having it there
Implementation of maintenance - Note to DDI-CDI group
- Is this a process we need to talk about this from scratch
- No there isn't anything specific
- Bring in the broader discussion
- Bring up the material from the Dagstuhl meeting in October 2017
- There was a UML model originally for Lifecycle but there was the problem of keeping in sync
- UML centric CDI model actually ended up with a platform specific model and platform independent model
- Right now everything doesn't need to be done the same way but we need to know what the process is and how to transfer it between groups
- Cloud solutions for tools in the future for version control etc.
ATTENDEES: Wendy, Jon, Dan S. Darren Oliver, Barry, Flavio
https://docs.google.com/document/d/1DuYMk2n8GSfbxjlEjweDuw42Y29Ork2npt6Oq-jQOyQ/edit?usp=sharing
varRange - assumes that the continuous variable range is in the order they appear in order of what is in the data file; not always the case. Documentation must clarify that this is useful ONLY in conjunction with physical layout information for a specified instance
57 has nothing to do with SDTL but with the SDI work. Notify George.
CDI vs MRT
What is the remit? How does the remit of MRT overlap with TC as described in the ByLaws
CDI needs to look at the longer term maintenance in terms of tooling and how changes are recorded and changed
Right now there is one person doing changes or is funelled through one person. This was what we were trying to get away from. What do they envision their process will be like in the future?
Rapid prototyping was the goal and why we are switching to COGS.
XKOS is a bit different as it is pretty open through the gitHub tools
It is under version control. Could put it into different formats.
Jon will draft up some notes on this so we can get it to CDI as soon as possible for them to respond to so we go into pubiction with an idea of where this goes.
ATTENDEES: Wendy, Darren, Larry, Jon, Michael, Oliver, Flavio, Dan G, Barry
Discussion of providing a means of resolving DDI RDF vocabularies (currently CVs and XKOS) down to the vocabulary instance level. Darren and Oliver will both be available. Goals of this meeting are to clarify exactly what is needed and a process for achieving this in the near future.
Controlled Vocabulary Issues:
Currently, notional URIs exported as RDF (SKOS) content from CESSDA Vocabulary Manager do not resolve anywhere e.g. https://ddialliance.org/Specification/DDI-CV/AggregationMethod_1.1.html#Sum . Secondly, these are HTML URLs rather than URIs functioning as resolvable persistent identifiers. Thirdly, there is currently no RDF endpoint infrastructure at ddialliance.org to support URI resolution.
There are multiple problems including the delay in getting new versions published (example used was a version 1.1 where most current published version is 1.0
DDI system currently does not provide a means of resolving to an vocabulary instance endpoint. We also have minimal clarity on what should be provided as an endpoint to an RDF vocabulary (html product https://rdf-vocabulary.ddialliance.org/xkos.html) or a .rdf expression such as Terse RDF Triple language (Turtle)
Getting to the end point, the object is still a problem (RDF end point going to a item)
RDF Endpoint:
Goal is to get a snippet of RDF data, whether by direct query or as embedded RDF in HTML pages.
HTML is not particularly useful for developers
Needs infrastructure to present and RDF endpoint to resolve URIs such as Apache Fuseki
2 things to be solved:
1-CESSDA tool needs a proper resolving mechanism for what is within the tool
Oliver address that in a team meeting at GESIS regarding understanding of RDF stores
Someone else needs to help Sigit to identify and implement right entry
2-What do we need to do to publish the DDI vocabularies
DDI vocabularies need to get published on Alliance end (they could be managed and edited elsewhere prior to publication)
DDI needs to support resolution of Alliance URIs
We need to port everything out from CESSDA Vocabulary Service as a SKOS or SKOS-XL file.
DDI uses the CESSDA CV manager as a means to edit and manage versions of DDI CVs. DDI is responsible for the publication and resolution of the DDI CVs.
CESSDA CV manager needs to be fixed
The standard export of SKOS has some internal issues
Original CESSDA vocabulary has a URN that resolves within the CESSDA system (but only to an HTML page). The data model currently in use at CESSDA will be challenging to transform straightforwardly into SKOS(XL) structures. Additionally, the object model is predicated on the idea of one CV object per language per version, which does not completely align with DDI treatment of language as an object property.
The client capability of the CV service is that the client carries the traffic for the URI
The URI being generated into the RDF could resolve to an endpoint outside of the CV service
If we set up an RDF container just consuming the RDF that gets exported from the CV manager and give that one to client we could route back to the CESSDA system for resolution. Set up a proxy rule for setting up the way to tunnel to the CV service. However, this does not solve the issue for other DDI managed RDF products such as XKOS or RDF expressions of other products. This will be a growing issue.
Would ICPSR be willing to set up an RDF container
Currently DDI uses static HTML and XML files on a server. The current server is set up only to support Druple pages
It is a Linux Apache web server
The web site publishing could work in the same way to static files - SKOS to update into HTML and other formats
The whole point of DDI using the CESSDA tool was for editing and version management support which in the past was a fully manual process.
We need to create a native environment that supports resolution of RDF endpoints; separate RDF store able to resolve DDI Alliance URL/URI
--We need to have an RDF store. The question being WHERE? We had a cloud server at one point for Lion
We need to talk separately about editing and publishing. Editing will continue on CESSDA tool for now as it meets our needs. Changes in the system, particularly output changes, would need to continue to meet the needs of the DDI CV group
Suggestion for approaching this:
- Our discussion is mixing editing with publishing. Currently editing is tied to CESSDA CV tool functioning. TC needs to focus on publishing and what that needs to include (in addition to static pages) - can someone write down what the infrastructure that could act as RDF store and publisher
This would help ICPSR to determine how and if they could host it - see who is prepared to host such a thing
RESOLUTION:
Clarify performance requirements, workloads for a publication and resolution system
CESSDA remains the editing and version management platform
ACTIONS:
- Darren will draft up requirements for software stack for publishing endpoint - before next Friday SB meeting. Oliver will assist. The document should explicitly mention Apache modules that can be used to support this work
- There is a concern that the data model in the CESSDA tool is not able to publish in the format we need. TC will need to monitor this and be prepared to provide alternatives.
- Michael works primarily with static web content. He would need to bring this to Jared in order to make the case to get time and developers to work on this if ICPSR is willing to provide this service.
ATTENDEES: Wendy, Dan S., Oliver, Flavio
Reviewed the entries in Codebook bitbucket related to CV and text consistency work. Dan will do a more detailed review and validate. The next areas of Codebook work are some specific issues for extended content.
Went through first draft of document commenting on workplans submitted to SB. Added some points. Wendy will update and post for additional review and comments. Sending to SB next week.
ACTION:
Wendy will contact Darren for availability next week to have a discussion of steps needed to get a resolution system for RDF content supported at DDI Alliance. If he is not available we will set up a separate meeting.
ATTENDEES: Wendy, Dan S., Darren
Discussed the Codebook changes for CVs and added notes on conclusions
Controlled Vocabulary issues - DDI - Confluence (atlassian.net)
Looked at CESSDA SKOS extracts at
https://vocabularies.cessda.eu/vocabulary/ModeOfCollection
Information will be transferred to specific DDICODE issues as relevant
ATTENDEES: Wendy, Jon, Dan S., Flavio, Larry, Oliver
Update from SB meeting
URN resolutions - This should be high on our list in terms of the technical work to achieve URN resolution for CVs and XKOS. SB is looking at some of the broader policy issues regarding the level of support the Alliance should provide. What this would involve and cost.
2021/22 activities - provide some more detail in terms of things that need to get done; set some priorities
Reviewed the TC presentation identifying priority issues for TC. Wendy will relay in review of all group plans and development of overall plan from SB for approval at Scientific Community meeting.
Additional information on web page work including content model for DDI overall
- Showed Training Group organization chart for the pages under LEARN. Helps to inform where the pages TC is responsible for will fall. Checking with Jane on some specific questions about the former Getting Started page and responsibility for the parent page Resources
- Discussion of Mapping and definitions of different levels of mapping (what is mapped, how its mapped, what the mapping is used to support) in the context of the Getting Started page content and what the TC is doing with additional levels of the Overview of Current Products page.
- Wendy will incorporate notes from discussion and post for additional comments. See Mapping DDI
NEXT MEETING
North America goes to DST on Sunday affecting meeting time next week
No items this week to follow over to next week
Jon will check for agenda items - early next week.
If no agenda items the meeting will be cancelled
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Oliver
Getting Started Page discussion:
- What is the current purpose of the page
- Is anyone looking at it (send Barry request for )
- Task oriented
How might this page be used?
- New collogues - starting to work with DDI data what would you like to have here
- Developers - For getting started developing with a particular product - would this be better served in another area say through products with developer information which is more product/version specific
- Business deciders - Does DDI cover what I need to do?
- Boss said to use DDI and so they need to find out about it
- Or new in this job and want to provide a place to go to get general impression of DDI
- Assuming we are targeting people who are doing a specific task and are coming to the web page with zero knowledge
Conclusions: This page is intended for the a person with zero knowledge (new potential users), so what do they need and how might this be presented?
- High level UMLish picture of what are the main things DDI works with (studies, questionnaires, data sets, question banks, variable banks, content management
- Describe with some prose - what is inside the box that is labeled DDI what can you expect from that
- List of use cases -
Information needed to describe task work:
- What products can be used - how each supports
- What fields - why, how to use them, how others use this information
- tools and resources should be linked with filters not repeated
This cannot just be refreshed and needs to be reimagined in the context of the current site. Need to look at what are we trying to do here - get a clearer idea of what this is trying to accomplish - how does it tie to the overview with the application perspective
ACTION: write up an approach to reorganizing
ATTENDEES: Wendy, Dan S., Larry, Oliver, Flavio
Codebook content review:
line 4604
Its set as mixed why is that? should just be a string with attributes (mixed="true") remove
Avoid mixed content where possible
language code attribute provide source code - will this change
What else should be going into except the code - what about the language
why is the spec number the value rather than the attribute for the code and name in script
General text description, code this is what the description so change the example English as spoken in Canada en-CA
Follow-up note (email to members)
Language is mixed="true" because it's restriction base is simpleTextType mixed="true"
All of the elements in citation have this same extension base.
See below for the chain of element types in this tree.
Should we stay with the same structure for consistency or change the extension base see below:
simpleTextType mixed="true"
restriction base="abstractTextType"
choice ..n:
PHRASE
FORM
xhtml:BlkNoForm.mix
abstractTextType mixed="true"
restriction base="baseElementType"
xs:any namespace="##targetNamespace http://www.w3.org/1999/xhtml" 0..n
xs:anyAttribute namespace="##local"
baseElementType abstract="true"
attributes=GLOBALS
Workplan to SB: areas of emphasis
Integration across models emphasizing
Inclusion of common functionality - where to integrate from CDI
Moving content across products
Recruitment of new additional members both on TC part and SB part
- more people, more use cases,
- technical contacts
Review Moving Forward content and organize for easy mining of content and discussion
CV work update:
CVs need to talk to CESSDA (Sigit) about the update and changes to finalize the tool
Talk to Michael regarding and cc Darren in regarding rest interface or use the CESSDA tool for resolution - need to explore current state
ATTENDEES: Wendy, Jon, Dan S., Larry, Flavio
Review of document outlining Codebook development rules
Clarification for document clarifying work rules for Codebook: add "When the instance is changed to reference new schema"
Action: update, post on wiki and inform Codebook group for additional comments
Discussion of next fiscal year and 2-3 year work plan for Scientific Board due 2 March
DDI-CDI review, vote for publication, and publication if approved. This has been shifted from the last quarter of the current fiscal year due to shifts in the DDI-CDI workplan following feedback from presentations/review.
Codebook 2.6 review, vote and publication, and publication if approved
Long term discussion on the Codebook future structures
Bring in Roadmap information on Lifecycle 3.4 and plus
Is Lifecycle moving to modeling base - COGS is the development tool which will generate a UML model, XML, RDF, JSON...others?
--Sequences COGS has an attribute on every property if it is sequential or not and will then use the sequencing feature of the output binding
--Choice, substitution groups per se not in COGS (it includes inheritance which could be presented in
Clarifying statement regarding use of COGS and the implications i.e. supporting a more approach to modeling supporting various expressions
Comparison and mapping work
Include sentence or two about HTTP based resolution process in our workplan for next year (see agenda from meeting in Minneapolis for additional points)
Roadmap document:
https://docs.google.com/document/d/1ivEv7dGIDne7dAYXo0sMT10UBIXjr_EIIbIMbD8eCgE/edit#heading=h.6p5wwqkk33jl
Long view 2-3 years out:
Get feedback from people using CDI to see exactly where and how its being used and who that community is
Overall how to the products work together in particular how does CDI fit in
How does advancements/changes in one product affect other product development
By definition CDI has a goal of integration of other standards - how does this impact current standards and their development
That the role of CDI is a hub for integration
The role of products needs to be clear use case driven rather than content coverage
We need to develop the game plan or road map
Relatively urgent question in terms of funders - is it a stand alone thing, an integration thing
We need to have a discussion of how products work with each other to meet overall goals
Put in something of the TC perspective regarding each product
Mapping is a TC level thing
Review of Moving Forward work to identify areas of improvement (Questionnaire, data description, geographic description, separation of logical and physical clarification and simplify, descriptive content for codebook)
Recover the discussion of issues over time and layout points and decisions over time
Align different product implementations over time - how does that work and what does it look like
ATTENDEES: Wendy, Dan S., Jon, Jeremey, Oliver, Flavio
COGS:
- Cardinality - dealing with 1..1 or 1..n choices - how should these be handled
- Use a superclass (ex. Agent) where this can solve the problem
- You can relax 1..n to 0..n in most cases. Requirements should not have to be enforced by the schema except in identification or reference objects
Discussion:
A problem with overuse of required content was that it was impossible to serialize without declared relationship so doing production work was difficult. Constraints should be added by producer to check at different levels of production; enforce by profiles or other means rather than through the schema
There was a discussion about mechanisms for this at Dagstul - Profiles are more documentary, schematron is a means of verification checking (ex. Relax NG, https://www.w3.org/2012/12/rdf-val/SOTA)
Consensus was not to do it in the schema
Add to high level documentation - validation information - this section should also include issues of secondary validation for ID uniqueness etc.
Decision: How to handle minOccurs="1"
- 0..1 if we can have a descriptive describable base enumerated and described representations
- check current hierarchy identifiable, describable, additional like Agent, so look for representation etc. ways to do this
Inserting this kind of typing doesn't add to complexity but allows disaggregation into logical sets that make sense - Reference would be limited to specific types
Physical Structure sections need full review:
General review physical structure options
Substitution groups have been also been excluded in some chases (variations of nCube physical structures not included), record layouts also addressed in this way
Check Best Practices to see if these can be clearer
Options for retaining Provenance between 3.3 content and COGS (capturing specific changes)
There is a column for information on choice but CSV was meant to contain information for processing output (rather than input changes)
Add another file to item type directory to cover provenance information rather than the CSV
Moving Forward Project outcomes:
Quite a bit of improvement was made in areas like questionnaire and representation abstract underpinnings
There is a need to pull out a lot of useful things that came out of the MF project SB should go though the whole
TC should go through and identify things that can be brought into Lifecycle and/or Codebook - put on future agenda
NEXT WEEK
Codebook - development rules
Set up changes being made and deadline for review
ATTENDEES: Wendy, Jon, Larry, Dan S., Flavio, Barry, Darren
Codebook Best Practices -
ACTION: Darren will work on draft, Wendy will provide background material to date
Identification
Text
- use of CDATA to wrap content
- encoding of text within text fields
- Formats that may be the content (is there a list) - (may file new content)
- Look at what's been added to structured string in Lifecycle
- Strategy for interoperability
- How to make sure content is still portable within the DDI world
Controlled Vocabularies
- In addition to text content use of concept in text string
Review of Codebook changes
- How do we want to go about this?
- For approval process - what are the guarantees about forward and backward comparability
- Wendy draft this up it needs to be a first step (design rules)
- Use of JIRA and Bitbucket
- Having multiple people in the room - multiple approvals
- Final review in meeting - Wendy, Darren, Jon, Flavio, [Dan]
- Providing deadlines with requests for review
- Get started on this today (design rules)
Updates:
Scientific Board
--Need to put in workplan into SB earlier than normal
COGS work
--No specific progress
--clarified issues and priorities since last time we discussed
Content modeling
--Statistical Classification - almost done
--Variables
--Working on process and presentations
Examples
--Darren will look at
UPCOMING areas of work:
URI resolution on CVs - Darren will draft this up for some future meeting
--Wendy provide issue and minute links to Darren
--Difference between HTML pages and RDF - may need to pull Michael in at some point
--example of confusion around source identifiers
https://ddialliance.org/Specification/DDI-CV/AggregationMethod_1.0.html
Resolving API endpoints for agencies - well-known endpoint
--HTTP based API endpoints for an agency
--The agency has to resolve the URI past that point
ATTENDEES: Wendy, Jon, Oliver, Dan S.
Codebook Best Practices
Technical aspects to support interoperability on tools and versions.
There is some level of guidance within organizations (CESSDA, ICPSR, IHSN)
We are not defining application profiles (content to put in), our focus is the structure and technical aspect
--Identification
--Text content
--Controlled Vocabulary usage - clarification
Specific points noted for coverage:
- Have you any existing guidance documents? Any sore points where problems have arisen for interoperability or specific use?
- The ID system in version 2.1 is for navigating a document not for uniqueness - need to express what this really is which is what DDILifecycleURN and DDICodebookURN - what is the DDICodebook URN
- Clarifying how identification works and clarification of internal document links and these are not persistent
- Its a mixed bag of what is used and what they mean - clarify the implications of how these items are used
- May want to look at some of the text content and how we suggest people do the xhtml when you have other options such as PHRASE and Content. for example in Nation
- Its an open standard so people can do what they want. We can tell them the issues related to how something is being done.
- Point out places where there are inherent interoperability problems you should be aware of
- Example of stating what markup is used. For example if you need to take out XMTHL content and maybe move it to JSON
- CDATA - previous usage
- Backward compatibility - what do we mean by that. What kind of guarantees are we actually making? At least this should be in the high-level documentation.
We can definitely say things are deprecated (xml-lang example) xs:appInfo maybe use this for deprecation for the assistance of implementers and processing tools. We could use our own notation for it.
Do we need to individually nudge some people: IHSN (World Bank)? Let's wait to see what we get as those we noted were all on existing lists
ACTION ITEM:
Write up email soliciting information and send out (wlt)
Background information:
Notes from Codebook meeting:
Best Practices Lifecycle example
https://ddialliance.org/sites/default/files/DDI%203.2%20Best%20Practices_0.pdf
A technical Best Practices in the area of identification, language, and places where there are multiple ways to handle content, to provide a best practice to ensure interoperability on tools and versions. Also of concern is consistency within the community.
Feedback needed:
--Are there known issues in terms of interoperability?
--At what level are they describing for example specific fields such as CV use or just general? There are certain structures in codebook that are confusing.
--Focus is the point where it affects interoperability - bleed over a bit into content.
ACTION ITEMS:
--TC draft up questions to collect information
--Top 10 issues
--Content vs Technical
Notice on TC agenda sent to group:
TC, yesterday the Codebook group discussed the value of a Best Practices document and what it should contain. We also discussed the process for obtaining feedback on what should go into it and where there are specific problems that need to be addressed. Dan S. was on the call and provided a clear description of the overall focus of such a document. The approach that was proposed was for TC to draft a document for the purpose of getting feedback from Codebook community. This should include a clear definition of focus and what type of information is needed from the community in terms of specific fields and problem points that affect interoperability and consistency within the community. While the focus of this Best Practices is technical rather than application based, we need to ensure that any feedback that is outside of the coverage of this document will not be lost but will inform the content of new high level documentation that is also being planned for Codebook. At our meeting tomorrow I would like to draft out the coverage of a Best Practices document as we see it now and a draft of request for feedback/content gathering that can be sent to the community. This request should go out to the Codebook group as well as to the DDI-Users list and the IASSIST list (the original groups from whom the Codebook group were solicited).
ATTENDEES: Wendy, Dan S., George, Jon, Larry, Oliver
AGENDA
Markup Examples:
--What is the purpose
--Reflecting Best Practices
--Coverage and creation
Background info:
--Lifecycle has a Best Practices document
--Codebook does not but there has been agreement in Codebook group that high level documentation including best practices is needed
--SDTL has noted issues that cause problems for interaction with individual sets of Codebook examples
SDTL experience:
--Inserting into DDI 2.5 designed for 1.1.4 NESSTAR some problems
--Where it works and where it doesn't for example in terms of SDA products
--Think of this in terms of projects and what's needed for them to work
Purpose of Markup Examples:
1) Problem of migration between versions
--A bit nervous about migration activities
--Explaining how say concepts and variables lay out in different versions (strengths and weaknesses)
--Examples of reuse? To illustrate strengths weaknesses, purposes
2) Strengths of different work products
--Examples of non-intuitive linkages (questions to instrument flows or to development work)
3) Use of examples
--What makes a good example - What should go in what slots
especially can be a problem with codebook and code abuse
--Use of new features - pedagogical use
--From 3.3 onward its intended to be created by machine
--Bridge the gap between development and business sides
Related issues/questions:
--What about putting out examples of versions that are not recommended (3.0, 3.1) - these exist but could be tagged in some way
--If we are soliciting user content - better for the Alliance to create these linked with product production
--How would we curate user examples - without prejudice
--Dedicated examples and generic content might be useful for markup examples themselves
--Practicability of generic content - but based on real world
--Training group is quite a long way off from requesting or providing training packages
--If you're starting from scratch and markup should reflect best practices for technology development if nothing else
--Ability to reference appropriate object - fragment object referenced by some organization structure
--Keeping in mind where Lifecycle is going (serialization) and where/if Codebook is going somewhere
Best Practices needs to be propagated for products especially Codebook - help to steer with migration problems
--For migration purposes
--Push a bit more on best practices on Lifecycle where we actually have some
--We are running into different infrastructures working with Lifecycle are talking to each other more. There is not a lot of shared understanding on how we work together in terms of minimum set of Lifecycle. Different conceptions of what they need in different organizations. Best Practices could help to mitigate these discussions.
--The sort of fragmentation that had happened with Codebook where each location used things in different ways
ACTION:
--Focus on Best Practices for products particularly for Codebook coincide with new release of Codebook - Wendy
--Markup needs to flow from the Best Practices and need to provide examples of features
--Write up a summary of this discussion for further action - Jon
ATTENDEES: Wendy, Larry, George, Oliver, Barry, Dan S., Jon
Check-up on current work plans identifying status, action during January, and specific actions needed.
- CDI - need to follow up with Arofan, no meeting activity in November/December
- SDTL - will be working on capturing regression and other statistical processes. Capturing in puts is easy but need to describe model. Working with others to capture the ontology (example: identifying 5 things in an ontology you need to know)
- Markup Examples
- Changing to 2 separate databases (Tools and Examples)
- Looking at set of ICPSR 2.0 examples of DTDs for integration into regular database https://ddialliance.org/resources/markup-examples
- COGS up and running and will begin going through list of issues this coming week
- Year end report should be coming out soon
- Codebook test version of citation changes sent out and meeting will be set up this month
- Web based resolution: Workplan to work on the web based resolution, what is the priority on this?
- Wasn't there some issue with CESSDA on this? CESSDA needs to make a resolver because DDI does not support and item level resolver.
- Do they have the knowledge and understanding to do that? Do we provide them support for this? - This is an issue with their tool it is not an issue for the Alliance and the TC. Have we responded to them. (ACTION: go back and check to see if there has been communication on this - WLT)
- ACTION: Achim check in on update of document - that is what we're waiting for
- CV's move to alliance website - Oliver will follow up in January
- Oliver needs about a half day to work on this - then talking to Michael
ATTENDESS: Wendy, George, Larry, Barry, Dan S.
Pages under Learn:
Tools page:
- Change from Version(s) supported to Product and version
- Remove all examples of Lifecycle 3.0 because it was unimplementable due to problems with referencing. Also note Anyone wanting to implement 3.0 should use a later version. 3.0 was out for a short time prior to 3.1 as it exposed implementation problems. Anyone wanting to implement 3.0 should use a later version.
DDI Profile page
- Is that an appropriate title
- Implication is that it is use of DDI Profile but they are really examples of use. Should they be element sets expressed also as profile
- Determine what this should be and how it should be organized. How do we want to organize the content after determining what the page should cover
- Maybe be more general about what a DDI Profile means
- Is the profile useful to describe usage? maybe schema Tron is an option here
- Clarify what this page is actually application and use examples by the DDI Community (different from and official best practices or example page
- Revise text at top of page
Where does this heading for Survey Metadata belong? New sub-section? of what? who should manage? Jon's content so ask him for suggestions. The endorsements buttons go with this - This has a 3.0 profile at the bottom
- These are user supplied so "Recommended elements sets" are really examples used by various organizations
Relationship to Other Standards
- currently just descriptive content (leave in learn?)
- Link to more detailed mapping/relationship
- Who is your audience, where will they be looking?
- Should they just be external standards? Yes
- Could picture on the product page having a concept map between products.
- Look at XKOS as a means of expressing relationships (or SKOS)
- Add entry for SKOS
- Remove DDI Codebook
- Expand content description to clarify that it is not just Lifecycle relationships
Implantation Guide page for Products
- Determine what should go into this after sorting out the pages under Learn
Additional comment:
Larry raised use of different browsers which display dropdown content differently - he will file with the website issue tracker
ATTENDEES: Wendy, Jon, Oliver, Dan S., Larry, Barry, George, Flavio
- First set of Products pages out (current products) - still need to update XKOS layout and developing products
- New pages would be Implementation Guidance on the Products drop down with information on best practices within and across products, mapping etc.
- New content model and related work would link from Overview of Current Products and be referenced by mapping information in Implementation Guidance
- EDIT: Look for in-put on overview page and change to input
- SDTL announcement is going out
- Barry and Jared have already started draft of Year End Announcement so we will wait and comment on that. Should be out by end of this week. On agenda for next week
ATTENDEES: Wendy, Dan S., Jeremy, George, Barry
Publication of SDTL and new product pages
- Trying to put out year end summary of what's been done emphasizing what can be done about what is supported for instances with fragments.
- Be stronger in terms of the Best Practices
- Implementation Guide Best Practices for Usage
- Bring up the prominence of this
- Best practices section in SDTL Guide - http://c2metadata.gitlab.io/sdtl-docs/master/
- Add verbiage to Lifecycle Best Practices to emphasis implementation
COGS - how will we manage versioning in COGS? Is it a matter of publication
- In lifecycle there is an official release it is published as static snapshot and that is what is referenced while the dynamic site. We will need the updated links for the Alliance web site
- Example of Lifecycle: https://ddialliance.github.io/ddimodel-web/
- Over the next week we'll update the links
AGENDA:
Year end announcement press release - next 2 meetings
- SDTL - Barry and George will work on this
- lifecycle 3.3
- CDI review
- ISO milestone
PR news flier:
- New organization of Products
TO DO:
Set up google doc with outline of content for additions and comments (in TC Drafts folder "Year End Announcement 2020"
ATTENDEES: Wendy, Oliver, George, Jeremy, Dan S., Larry, Johan
SDTL Publication:
1. What are the next steps to publishing the SDTL standard?
The package that went out in the same form
Only tested the JSON serializations therefore only official serialization
Add additional serializations as tested XML and OWL schemas
Send page information to George and URL to George
December 1, 2020 intended publication date
2. Since we published the standard for review, we have made a number of small changes to SDTL. There are a few more small changes that are pending discussion at the SDTL WG tomorrow. All of the changes are listed on this page: https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/1207926786/Pending+SDTL+Changes
Mostly minor changes for example split a single property into 2
As a new product there is not an existing community of use
Documentary additions to usages and norms
Prep for publication:
Building all of documentation into COGS
Section on usages
Announcement content: provide to Barry
Include the webinar link
Training slides preservation copy link
NSF grant information add to page
ARPA filing
Verifying that the document on TC-4 is the most recent. File dated October 30 posted on Nov 5 which is the day the questions regarding the regExp were noted.
Once verified Dan will check and once finalized will be filed.
ATTENDEES: Wendy, Jeremy, Dan S., Jon, Larry, Oliver, George
Webinar at the beginning of a voting period.
- General agreement as to the usefulness of adding this to the vote process. George noted the following points that should be kept in mind:
- Have Jared send out the notification of vote along with webinar. This is helpful in turn out
- Get feedback on webinar content from TC, development group for product, and others like Scientific Board, Barry, and Jared
- More helpful to have a process for creating the webinar than a profile
- New products can be complicated and involve more explaining of where it sits in the broader ecosystem. Process should include:
- announcing with vote
- getting broader feedback from developers of product, TC, SB, Marketing, Jared
- determine what should be covered based on product; "fit", addressing issues, and due process
GSBPM/GLBPM organization of DDI Suite content coverage:
- Who is the audience? What is the use?
- May want to run by various user groups
- The intent is to cover information content and address functionality separately using this as a reference point in terms of content
- What are we trying to do with this? The variable is a good example
- There is strong utility to something that says "this is what Codebook does, this is what Lifecycle does". Clearly have additional levels to this. Functional is the primary step.
- Might be easiest to pick out the commonly requested things.
- In a competitive world of description there are lots of approaches.
- A really good start but then start in parallel a functional approach for a few specific examples and determine what you need to capture to describe them. Worth thinking through what are useful things to talk about with these.
- Jon would be happy to join on this.
- More generically, is there anywhere that has a more academic approach where we could produce this kind of approach to do some of the grunt work of comparison for different audiences.
- Contact with information schools might be useful as a project. But it would be a long process because they would have to learn the models of DDI and the structures that it works within. Post-doc or graduate opportunities through say RDA. Talk to Maggie about these possibilities. Dissertation proposal.
Resolution issues:
- Needs to resend information from the spring
- Fine tuning the regular expressions
Announcement:
Over 200 signed up for EDDI
ATTENDEES - Wendy, Jon, Larry, Dan, Jeremy, Oliver, George, Flavio
DDI Codebook 2.6 citation, CV, access
- wlt - draft up new xsd for testing start on next set of issues
- Hopefully out in November
DDI Suite - coverage work
- wlt/fr - working on content structure
- picking it up again - its a bit fuzzy still but we are making progress
- General table is clarified and just needs to go up
Mapping within DDI with others
- DDI Suite needs to be completed first
- Jeremy - discussion is a NESSTAR and Colectica
- A lot of this is tool related so mapping is just pointing out the pain points
COGS input / output
- jj - make corrections for CSV content
- reverify output from corrected CSV
- determine expectations for output types
- Jon is having someone start looking at it before the end of the month
- Has a list of what needs fixing and do that first
- Then code review to locate other problems
- 2 repositories you can create issues regarding input to CSV and then CSV to bindings
DDI Lifecycle expanded documentation
- wlt/jj - work on content
- perennial favorite - try to schedule some time by the end of the year
DDI Alliance pages
- wlt - transfer content to template for currently published content
- wlt - prepare content for products under development, registry, etc.
- Do by mid-November
URN resolution issues
- TC-4 Addition of support for resolving to objects seems to be taking on an enormous job at a cost
- is there a need there?
- What is Achim proposing? If people start to rely on a resolution service
- We can do some things for http resolution
- have a well known end-point with information on an agency and templated for a url
- A documentation templated url
- An object templated url
- We may or may not have to do the redirection
- Get a templated url for an agency
- What is the technical knowledge needed to interact with what Achim is proposing?
- We need something usable by the community
- CESSDA may want a more complicated solution
- We talked about the possibility of establishing a well known json end-point
- Oliver has no additional insight as he left the development too early to know what they decided on - could ask developer if he knows anything about it. Could also contact Taina for more regarding what she sees as an issue.
ACTION:
- Dan will write a paragraph or two on this which could be used to try and kick start the discussion on json
- TC-218 The ddi-cv was supposed to change to ddi.cv according to the issues that we have and they need to change
ACTION:
Contact Taina regarding the appropriate agency identifier
Priorities:
- Wendy - web pages, Codebook, lifecycle documentation
- Jon - COGS input to CSV
- Dan - restart work on establishing a json endpoint (attempted last winter/spring with only a single response)
- Oliver - clarifying CESSDA's needs
ACTION ALL:
- Read TC-219 and comment as needed
Interesting sidebar:
ATTENDEES: Wendy, Jon, George, Jeremy, Dan S., Larry, Oliver, Barry
Excused: Flavio
Preparation for SDTL Vote:
Vote must last a minimum of 1 month
Link to SDTL page https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/899547182/SDTL+-+Structured+Data+Transformation+Language
Link to product information (see review page) https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/1120370729/SDTL+Review
Include links to X'd items
Important Links:
X SDTL User Guide
Download Package
Overview Documents:
X Introduction to SDTL
X C2Metadata Project
Websites:
C2Metadata project
C2Metadata on Gitlab
MTNA Dataset Updater
X SDTL Working Group
Take links from the slide deck to make sure they are current
SDTL User Guide: http://c2metadata.gitlab.io/sdtl-docs/master/
Introduction to SDTL: https://deepblue.lib.umich.edu/handle/2027.42/156015
Overview of the C2Metadata Project: https://deepblue.lib.umich.edu/handle/2027.42/156014
SDTL Working Group: https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/899547182/SDTL+-+Structured+Data+Transformation+Language
New Product Proposal
Proposals for new standards or technical products may be initiated as working groups of the DDI Alliance. To help the community assess the proposal during review, the proposal must include:
- a complete draft statement of content and functionality
- information about the business case for the proposed product
- the objectives the new product will achieve
- position within the suite of products supported by the DDI Alliance
- identification of a core group to work on the development of the product
- a maintenance plan
Purpose of webinar - Date Nov 2
Barry - the webinar is promoted on the website
https://twitter.com/DDIAlliance/status/1312995938974269441
ACTION:
- Provide Jared and Barry with text for Notification of Vote Nov 1 - Nov 30
- Codebook - adding file derivation property - check and get back to George (from J and Pascal) - inform J and Pascal that we are still open to new issues
ATTENDEES: Wendy, Dan S., Oliver, Flavio
sounds like they want an http resolution system
DNS based resolution service is a separate thing (to a location or a resolver) to convert to a URL
On the CV issue it seems that any kind of resolution concerning CVs
They want a URN and a URL endsite that resolves to a usable URL containing data
CESSDA has a way to resolve a service and specific entities
Dan is correct in terms of the needs of the CV group. A place to provide a URN for a CV
If you register the URN it is the point that resolves it
The registry points to the services at the agency
HTTP based resolution would allow the registrant to put in the information to a resolver or to a pattern to resolve the URN to a URL to tokenize the individual pieces
Add this feature to the registry site - URL pattern based redirection
Dan and Jeremy will put together a description and estimate of costs so we can present to Executive Board
1) a well known location for information about a specific DDI agency ID encoded JSON based informtion for transformation
2) a service to resolve to the transformation a pattern based redirect from URN to URL that the agency has registered at the agency
3) implementation work
Is there a set of information that we can provide
Tell Jon to write up information to clarify what is needed to those who are confused.
Should we have a conversation with the CV group to identify their problems
Answer to CV group:
Its doesn't work the way now that they want it to. We are working on it. The workaround is to just use the URL directly.
Once the URN resolution is in place then that provides a URN alternative
There needs to be a couple of different entries in the pattern (SKOS, DDI-Codelist XML, web page, etc.)
ICPSR just offers the CV as a bunch of files and does not have a database of terms. They could use the same thing as in the CESSDA CV service URL for CVs owned by DDI CV group. The infrastructure of the resolving mechanism is held at CESSDA.
Daylights Saving change:
Send info out to groups - meet when we have topics
ATTENDEES: Wendy, Oliver, George, Larry, Flavio, Johan, Dan S.
Reviewed pages in product section
- Overview - removed infrastructure requirements column; clarified meaning of last column and added "metadata" to the title; discussed revision of short descriptions at the top
- Development Products Overview page - reviewed contents
- Template - discussed layout and how to address multiple namespaces due to publication in multiple syntax; added naming group responsible for future development; added listing for selected articles which would be maintained by that development group
- Did not get to DDI Agency ID Registry page
ACTION ITEMS FOR WENDY:
- Overview Work Products: make discussed changes and notify group for individual review of updated document
- Create pages for each of the work product (each versions) and put up for review
- Post Developent Products Overview page so Larry and Flavio can review CDI contnet
- Post DDI Agency ID Registry page for comment
ATTENDEES: Wendy, Olof, Oliver, Johan, Dan S., Flavio, George Alter, Jeremy
RDF Pages
Just as long as you - some want a content negotiation for Turtle
With an anchor that gets you to the item
HT access file - its in the Disco speck Disco repository
https://github.com/linked-statistics/disco-spec/blob/master/.htaccess
Put Michael in contact with Dan S. if he has questions
SDTL
Vote delayed to November
Schedule Webinar after vote is announced prior to actual vote
PRODUCT Pages
The only place with a list of all the products on the drop down
Add an internal link
Come up with one line statements per product
Wendy will write one-liner's that people can review
Send out for review by group and Barry
DDI Registry Page - should this be DDI Agency ID Registry
https://registry.ddialliance.org/Agency
Add text to indicate you can look up agencies as well
Product development page
General organized like Suite page with grid and then separate pages
Looking forward to other pages
After product update
Look at what we want to do with those pages - work with Training Group
Overall Coverage Model
Model needs to move up in work plan and see how it fits in
Separating content object coverage and linkages that facilitate certain use
Presentation at EDDI on this
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry, Johan, Oliver
Guests: George Alter
AGENDA: SDTL preparation for publication vote
Target Date:
Scheduling vote mid-to-late October
Support material:
In terms of documentation an SDTL user guide has been created and set up so that the User Guide is the entry into COGS. 2 articles have been written (the C2metadata project and an instruction on SDTL)
Working on publication of first, SDTL paper has been submitted to IASSIST Quarterly, a third paper on Use Cases is being written.
Including:
- Description of data transformation
- Integration into PROV metadata
- Speculation on using SDTL for doing translation between
SDTL:
- has been meeting monthly
- there is a list of pending changes to SDTL to be completed - 3 of 4 are documentation only
- Other content addresses longer term expansion, post publication
- If there is some other documentation that people would like to see let George know
Webinar:
- Webinar is a good idea
- Hold in the week before or just when the vote goes out.
- George will check with ICPSR for technical support - could post on ICPSR YouTube channel
Coverage:
- Role of SDTL in DDI
- On-line tool that takes DDI and a script as input and then output an interactive codebook with natural language description
- What's actually happening is the creation of a DDI Codebook
- Talk about how this goes into Lifecycle having a ready made locations to support SDTL content
- Looks like a good portion of a webinar is there.
TC To Do List:
- Coordinate with Marketing and Secretariat
- Plan a vote for late October
- Are we going to have to work under the old rules or the new rules - check with Jared
- Check with Jared and Barry on any preparation they want to do.
- Plan a vote for late October
- Create and add SDTL page to products
- User Guide is being provided through GitLab and versions through that process.
Is it a separate file? HTML page and a PDF generation. Determine how this gets presented on DDI Alliance site - Add to overall products page
- User Guide is being provided through GitLab and versions through that process.
- Define a base URI for SDTL
- PROV-ONE would like to have a reference that points to the OWL version of SDTL
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry
Updates on the following topics:
DDI pages - content and layout, timeline
- reviewed suggested edits and accepted with following exceptions
- not just research data - leave without this restriction
- addition about applicability of Lifecycle to description of longitudinal, complex, etc. - will add usage to next column and reword in description to reference design/structure
Codebook - current work status, TC review of work/changes
- At what point should TC review
- Commitment that what is in codebook should be in Lifecycle
- Have to have an approach (how) any new item would be added to lifecycle
- TC iterative review - when send out sections for their testing that at same time send to TC for review
CDI - update on webinars, publication work
- Delay to 2nd quarter 2021
COGS work - where are we? are there things we are waiting on?
- Reference is just a reference
- If they have additional needed content
- Beginning of October - can we frame this out clearly during September so we can address them in October
SDTL
- Progress in the SDTL review and how he wants to proceed to publication
- Talk to George
- No feedback
- Don't want to change during review
- Get an idea of if there is an anticipation of a point addition to include additional functions in the near/mid future
ATTENDEES: Wendy, Jon, Dan S.
Clean-up:
- TC-215 what is the time frame on this in terms of action
- Labeled for future action
Issues raised in Codebook:
- DDICODE-64 addition of a controlledVocabularyValueURL - is this something to be considered for Lifecyle or as a general part of a CV in the future?
- Added a comment regarding clarity and consistency on what is entered in the string
- DDICODE-69 Import statements don't work
- Done to support local use - comment added
ATTENDEES: Wendy, Jon, Larry, Oliver, Jeremy, Dan S., Flavio, Jared, Barry
Review of DDI Alliance pages assigned to TC
About:
- Content of 2 pages from About should move to Products
Product tab:
Menu Items:
- good
Content:
- review specifics - set up google doc for editing with deadline
- Topics good
- Describe what it does not what it is
- It needs to be sufficiently signposted - "Overview of Work Products"
- Being explicit about mentioning DDI Suite to bring it into common usage
- Reference to About page History of DDI Development from Overview page
- Is there a need for a quick media piece? maybe not make sure content is there and then see if there is a need.
- Link this with how it is/can be used
- Drill down into more detail
Layout:
- makes it digestible / usable concept good work on layout limitations with Michael
- Table layout may be too wide
- Could change - in terms of depth
- Other standards:
- Move all relationship to other standards content into a grid-like
- Include all products
- Language needs to be understandable on these entries - clarity
Tools, Profiles, Markup Examples
- Relationship to other standards should move to products
- Work with the training group on appropriates links from "Learn" content
ACTION ITEMS:
- Throw examples into google doc for editing
- Redo Products under development page in same structure as overview and then move to Products
- Review the work products page from About to ensure that coverage is incorporated in the product description
- Draft up Lifecycle and Codebook by flattening current
- Once up Jared can remove those from About
MEETING CANCELLED
ATTENDEES: Wendy, Dan S., Larry, Flavio
TOPIC: DDI Alliance web pages under Specification drop-down
GENERAL THOUGHTS:
- Overview with Suite description and table of 4 areas of description then you could bore down
- Distinguish between published ones - that have rdf, xml other
- Separate product pages would be useful
- Flattening within the products - into a single page with latest version and then have links to previous version
- Specification pages have buttons which are clear and direct. Ours are a bit wordy. Make sure direct links at the top.
- Look at how to have links (single, and within text)
THINGS TO KEEP IN MIND:
- Additional information - mailing lists (from rdf page)
- Keeping on the products with lists, issue filing etc.
- box SPECIFICATION should change to PRODUCTS and go to the DDI Suite page
- Do we have any feedback from potential audiences? Check with Barry for page use analytics, paths through the site
PAGES we can look at that are well organized:
- Schema.org has multiple products and breaks into different groups - look at that as model
- Denodo https://www.denodo.com/en
ACTION ITEMS:
- Need to prepare samples on both display and level
Do layout of products and then determine what other information like binding specific information
- Talk to Training about those pages assigned
- Java script turned off changes output - Larry will report
- Wendy will review items in DAWS to see what should be closed.
ATTENDEES: Wendy, Jon, Larry, Jeremy
Work plan for COGS work:
- Transformation issues (collapsed with cardinality changes, relaxed cardinality with required objects, empty objects consisting only of attributes)
- Determine if there are one-by-one decisions that need to be made
- Determine which are there for a reason or if there is a decision to be made
- extended reference - just model rather than hard code
- Review complex choice translation - these may require remodeling or hand crafted translation
- CSharp transformation file - see if we can locate someone to work on this if we see it as a priority
- StatsCan and CESSDA are looking at the UML documentation and it would be useful to have corrections made in the not too far future
Additional bindings and a different XML binding is a separate topic:
- format of bindings
- discussion of future formats of Codebook etc. and how to address technical limitations
- probably start looking at binding options etc. in early 2021
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Larry, Flavio, Achim (guest)
URN document
- Version still listed as optional - change
- page 6 "the version should be a hierarchical version number"
- what should be written regarding the resource versions?
- It should be unique value within the agency and resource identifier and case sensitive
- Use the same terms in ABNF grammar - use same naming terms used in other, but check on length limitation
Add short term in text next to full term to avoid confusion- agency identifier
- resource identifier
- version identifier
ACTION
Corrections will be made and then document redistributed for comment period and then discussion will begin with ABNF
Achim will enter TC issue regarding CV documentation corrections needed.
Cardinality relaxed
The cardinality should reflect actual standard
The transformation tool makes conversions when converting
Abstract to base --- COGS has one identifier system so they were collapsed
They were rewritten semi-manually so there may have been an error
Location of all cardinalities
Substitution groups were handled in multiple ways -
check on substitution
Known problem modeling areas:
complex choices
physical descriptions
Discussion points:
Reference - if we want additional content we need to hand write those - check on what comes out as reference (see settings folder)
Choice - one way to solve choice issue if they were derived from a common base class we can have a single choice pointing to the base - where should we add base classes
Corrections - currently just used for documentation so we can publish any corrections we can rerun and get it up on the site
As we make fixes we can reiterate all the sets
ACTION
Have concrete examples of each - prepare full spreadsheet of issues and post. Also post initial comparison spreadsheet
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry, Oliver, Achim (guest)
Regrets: Flavio
CV tool
progress waiting on completion of API update to finalize
URN work:
see document FinalQuestions.txt and emailDiscussions-week20200702.txt as background
Decisions appear in BOLD
1: Included characters -
https://en.wikipedia.org/wiki/Uniform_Resource_Name#Syntax
Not include hex digits, backslash was allowed after a character to support hierarchical names from systems
namestring = assigned-name
[ rq-components ]
[ "#" f-component ]
assigned-name = "urn" ":" NID ":" NSS
NID = (alphanum) 0*30(ldh) (alphanum)
ldh = alphanum / "-"
NSS = pchar *(pchar / "/")
rq-components = [ "?+" r-component ]
[ "?=" q-component ]
r-component = pchar *( pchar / "/" / "?" )
q-component = pchar *( pchar / "/" / "?" )
f-component = fragment
; general URI syntax rules (RFC3986)
fragment = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded = "%" HEXDIG HEXDIG
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
alphanum = ALPHA / DIGIT ; obsolete, usage is deprecated
ALLOW: pchar, unreserved,
Proposal is to restrict to the NSS content, allow sub-delims
Point of the URN to provide flexibility by allowing sub-delims
unresv, sub-delims, @ sign minus :
align each of the portions of the id
PROPOSAL - AGREED with modification to Agency to constrain to DNS label definition as expressed in an RFC document
AGENCY:
pchar (minus "%" ":") - limit further if needed to restrain to DNS label definition in an RFC document
ID:
pchar (minus "%" ":")
VERSION:
pchar (minus "%" ":")
General approval - present RFC documentation prior to finalization
ACTION: locate RFC document and place in issue TC-4
Further restrictions should be done in the specification. Use example of the underscore in Agency as something that would be restricted in the specification to meet current rules.
Question how to propose a URN and resolution
Not mentioning resolution in submission
Achim: If we have a restricted character set for agency why wouldn't this work for any other system as it seems very stable?
Main questions focus on URN only or also DNS resolution
Are we looking at outside systems
The URN is not restrictive but the DDI standard will always restrict Agency to something that is DNS resolvable. The DDI standard is always defined as something resolvable in DNS.
Look up DNS label definitions in an RFC document to get label allow
2: Limit to ASCII - AGREED
3: Version as optional - AGREED as being required
COGS Review
Wendy went over XML to CSV validations she's doing covering:
- cardinality
- extension base content (extension base is found in the file structure in a 0 kb file name)
- coverage
- documentation content (element, complex element, use of element)
- identification (transference of content IVM into single structure)
Some issues have been found which will be explored and described more fully and presented for next meeting. Flavio also has found some issues in the UML (XMI) renditions.
Next Week:
COGS review with specific issues identified for discussion prior to meeting.
ATTENDEES: Wendy, Jon, Dan S., Larry, Flavio, Jeremy, Achim (guest)
URN namespace for DDI (this links to document also found on TC-4)
- When final can file with IETF for the appropriate review. Then it can be registered so we can begin to resolve DDI URNs.
- The related issue is TC-4. URNnamespace.docx, Best Practices and examples from DDI-L 3.3 were used to update the earlier document.
- URNnamespace.docx recommended the use NCName for each part which is very inclusive:
Implications -
- Agency ID is a compound of DNS labels so need to reflect that syntax
- Recommend to restrict by removing "_"
- Issue raised by Dan S.: NCName also restricts beginning set of restrictions - can't start DDI with number which is incorrect. Believes the intent was the loser general definition and not the XML restrictions regarding starting characters.
AGREEMENT on the following:
when NCName was used in documents was meant as a general term of xs:string sans ":" not the xml:NCName
SUGGESTED solution:
define as a string without colons
use xs:string as agency part sans ":" "_"
use xs:string for id sans ":"
use xs:string for version sans ":"
ACTION NEEDED: Circulate this suggested solution for comments and agreement among TC prior to next weeks meeting (preferably by Tuesday so that Achim can incorporate this solution into the document)
Internationalization issue raised by Achim:
Do we support internalization (more than ascii)?
Discussion:
- Goal is to leave DNS filing loosely and stricter rules enforced within DDI
- Is Internationalization best dealt with in the filing or in the DDI restrictions
- When you allow internationalization within a URN it needs to be dealt with by the resolution agency
- Are internationalized characters a new requirement?
- NCNames are internationalized stings so it comes up
- Should we limit to ASCII? We have never previously allows any digits outside of ASCII.
- The domain name system changed to an internationalized set
DECISION: Limit to ASCII
Issue #4 Version is required for identification
Discussion
- If used by a reference there should be an option to express a late bound
- Can it be registered as optional and enforced in the use in DDI?
- Request in HTTP for flag for late-bound - service level
- What does latest mean in different system (cryptographic hash)
- Doesn't make sense for making it optional as a means of expressing
ACTION: Resolve whether version should be optional in namespace document
ACTION:
- Nail down decisions soon -
- Move document and minutes to confluence
- Directions for comments in notes
- Try to come to agreement by next Tuesday
Extensions mechanism:
- Use of attribute pairs on all versionable objects
Example: privacy preservation where and there are a lot of small flags on objects regarding past processing and status - Question: What are the best practices on more extensive extensions and how this is managed for interoperability
- Can we write recommendations/guidelines?
- What about when object is not versionable which means you had to do it one level higher
- Versionability and extensibility
- DDI 3.3 has user pairs on all identifiable objects but there are still un-identified objects that occur
- Requires a lot of bookkeeping to track which sub-object a user attribute should apply to
- Do we want to provide recommendations?
- When users encounter situations where they want to extend can we provide guidance on how to do it and if things should be filed for change in standard
- There is a balance between easy to extend content you are going down a road of fragmenting DDI.
- Privacy has been raised by a number of people which requires a larger discussion
- Some of this is system specific issues and how they are handled
- How do we integrate other systems that may describe what is needed into DDI
- Raise as a DDI-L 3.3 issue on guidance - added documentation
Define what type of things may be needed - how to differentiate between what should be treated as system specific
UML Documentation -
- On the documentation page that are 0..1 which should be 0..n - some of this due to the CVS content. In other locations there are question
- Clarify which are XML to CSV issues and which are UML production issues
- Some associations are missing from UML - references from properties seems inconsistent
Procedure:
Check the 3.3 XML schema to CSV
Check the document generated from XML oxygen
Next Weeks Agenda
URN namespace for DDI
COGS review items (XML to CSV, CSV to UML/XMI)
Future Agenda Item:
Best Practices:
Email on IASSIST list how can I find metadata documents
Adding ?.ddi32.xml to file names
Deflate through gzip string - we expect that you will be getting DDI documents
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry, Oliver, George
SDTL
- may forward to others and will let Barry
Codebook - set up list
Description of DDI Suite of products (general verbiage)
https://docs.google.com/document/d/1MM1sCgvuo_wWguDyxq68NBLtGHDNp0khtLfQRPv3Jxs/edit?usp=sharing
- focus on the suite description and worry about the other pieces
- Send link to MRT group and others -- Marketing, Training, etc.
- Deadline for comments
TC responsibility for DDI Alliance web pages
- Good we know who is responsible for what
- Work with Training to define what those content pages should be
- Stuff that's going on in the Training area needs to be technically vetting by the TC
- Organize a call with Anja and Jane on coordination
- Talk to Marketing (Barry) about where coordination is needed
Update on URN namespace work
- Only other issue are there other people who in the community it would be worth having review prior to sending it off
- send JIRA issue number (TC-4) for background reading
NEXT WEEKS AGENDA:
- COGS content review (UML documentation, XML to CSV verification)
- File name extensions for best Practice Guide
- How to extend DDI using userDefinedAttributes and general extension of by inclusion of DDI
- URN namespace document (see TC-4 for background)
ATTENDEES: Wendy, Jon, Larry, Oliver, Jeremy, Barry, George, Dan S.
Trademark
How to protect DDI from some sort of preditor from taking over the name
Does it cover all of our products
Jared needs to talk to Lawyer
Barry
How to position the suite of DDI products
Nail down the "DDI Suite" description
Reviewed contents of SDTL Review Page
https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/1120370729/SDTL+Review
SDTL role in the DDI Suite
Purpose and Benefits section should be copied to review page
Benefits section is clear statement of role of DDI
Went through checklist provided earlier to George
IP regarding CC BY 4
Referred George to the use of readme.txt files provided in DDI packages in terms of attribution information
Added note:
There is a meeting of the SDTL working group on Monday - Wendy and Jon will attend and go over review process
FROM CHAT:
From Larry Hoyle to Everyone: 09:20 AM
https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/899547182/SDTL+-+Structured+Data+Transformation+Language
https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/1120370729/SDTL+Review
From Dan Smith to Everyone: 09:29 AM
From the minutes last week:
All of the usage and testing of SDTL has been retrospective using existing content and testing it
Use cases should include prospective cases - suggested test
VTL to be used as a processing language - can you write SDTL and produce usable VTL etc.
Link to gitlab location - link for tools
http://c2metadata.gitlab.io/sdtl-docs/master/
CLICK to file a comment
Please provide your name and email in the body of your comment
https://ddi-alliance.atlassian.net/CreateIssue.jspa?pid=11703
ATTENDEES: Wendy, Dan S, Larry
Updates:
CDI webinar (Larry) - about 18 some technical problems but they got through; another next week
Things learned about putting webinars on: Arofan was doing the talking and there was no way to get his attention when there were problems - need for a side channel to communicate regarding problems or questions like cell phone with text or multiple monitors
One to present, one to take notes, and one to monitor chats etc.
Will relate these points to Marketing and Training Groups
Dan S. will join the DDI-C group monitoring solutions
SDTL meeting soon
Suggestion:
All of the usage and testing of SDTL has been retrospective using existing content and testing it
Use cases should include prospective cases - suggested test
VTL to be used as a processing language - can you write SDTL and produce usable VTL etc.
COGS: Flavio, Wendy, and Jon are testing the UML, XML, CSV for transformation errors
Project page for the converter https://github.com/Colectica/DdiToCogs/issues
Will pass on to Jon and Flavio
ATTENDEES: Wendy, Larry, Oliver, Flavio, Jon
Just touching base with each other
Status/Update
CV waiting on some technical work from CESSDA prior to DDI publication - Oliver will confirm what the plans to address identified issues
Review Codebook group approach
One of the objectives is to get Codebook into a not manually created situation
COGS extract to a codebook model would be problematic
One of the long-term issues is how flexible the structure of Codebook can be
Could there be a one time change but we need to know the issues of backward compatibility and what the trade-offs are
Could be part of a public review or could be a separate review
COGS output evaluation of XMI and documentation:
Error in cardinality of choice
The error is in the transformation program - choice
The documentation is right but the UML is wrong - cardinality of physicalInstance/physicalStructure
Single source - so its the transformation
Tracking where the problem is arising
Items that are associations are also listed as properties but the cardinality doesn't match between the two listings
Need to track down where the problem occurs - Flavio will provide Jon with examples
Flavio will also add TC issues regarding the use of and requirements for the XMI produced by COGS
ATTENDEES: Wendy, Jon, Jeremy, Larry, Dan G., Dan S., Oliver, George Alter (guest)
AGENDA: SDTL Review Preparation
Discussion regarding preparation of SDTL for publication by DDI Alliance:
Web presence- where it will be
- Wiki - is a working area
- Product documentation should be in COGS under source control
- DDI Alliance site - this is an entry port. There is a description on the developing products page. When punished a full page with links to the package, User Guide and anything else you need will be created. Work is being done on DDI site to provide better access and control of content. We will continue to work with George on this.
What is going to be the package of information:
- SDTL product maintained by Alliance
- Model - COGS rendered out in different format
- Ecosystem around it - what is the relationship between model and ecosystem
- The Model is the only things that makes sense - whats in COGS and related documentation
- Link to Ecosystem from DDI page
Status : mature version of SDTL covering data transformation commands found in statistical packages (main library)
3 parts:
1 - Core of SDTL - JSON schemas of SDTL Commands
2 - Function Library - crosswalk library of functions in SDTL mapped to functions in other languages (SPSS, SAS, R, Python) - this will be extensible (100+ functions)
3 - Pseudo-code Library - set of templates to create human language text to surround parameters in SDTL command (this one is in English) -- is this tooling or part of the model; tools can be built on it - could be a question for the review - should provide the schema used by the tooling
Documentation of these three parts
Link to Tools such as
- Software is a separate development and not a part of the product - example MTNA has one for Codebook 2.5: Takes a DDI file and incorporates SDTL content, new variables, etc.
Some background information:
- SDMX-VTL looked at but ran into a number of problems and limitations. Essentially VTL is similar to programming languages and therefore didn't address the problem. Discussions with group raise possibility to adding translators to VTL (possible intermediary between statistical package and VTL)
- CDI could use in same way as Lifecycle
- Documentation on integration with these standards
- Pseudo-code - could be used with Jupiter Notebooks with Markdown so Pseudo-code Library is something tools could be built on
- COGS documentation would be the entry point of the model and then lead to the bindings
Are there any implications of where the COGS instance resides
- We need to coordinate where hosting is at Github
- One question regarding documentation can be provided by COGS. In some points there are links into COGS
- Example locations and User Guide similar to Lifecycle -
Could create in GIT for user guide which can be updated and produced separately for more frequent update
The User Guide would provide high level documentation and then examples in COGS
DDI Alliance GIT installation it would be good to mirror this early. Use GitLab as development space and then mirror accepted content in Github. Then when C2M project closes down then it would just be in Github (Jeremy and George will work with this)
What questions will be proposed on the review page. This will be discussed during the next month to reflect the needs of the SDTL group and any TC concerns.
NO Meeting due to virtual meetings of Scientific Board and Members Meeting of DDI
ATTENDEES: Wendy, Dan S, Flavio, Jeremy, Larry, Oliver
Apologies: Jon
Presentation reviewed and finalized
SDTL - plan to put out for Public Review July 1 review
- Statement of relationship to DDI in general, and specific
- What is there is in good shape
- High level documentation
Technical feedback on bindings -
- email sent to DDI-Users, IASSIST
- Let the latest release be up there for a bit - waiting on feedback from 3.3 to make sure there are no issues there
- Ask for this from the Scientific Board members - to pass on - do first
- Benjamin Zapilko (out , Franck Cotton/Thomas
- Alexander Mueldauer - Oliver will pass to him
- FSD people - send to Mari and ask her to pass it on
- Darren Bell -
- Codebook work will progress after SB meeting
Roadmap - DDI Product Suite
- Purpose of this work - clarify and state
- Pull together modeling approach
- What do we use as the entry point for getting into the work
- Different models and how they relate - GSIM and Lifecycle
- How they compare
- Common vocabulary should be encompassing of all DDI published products
- Kind of a GSIM plus plus kind of level - keep it neutral include some broader conceptual model of social sciences research work - agnostic terminology
- Get a basic model prepared to discuss with others before bringing them in
- Examples of what we are going for
- Content (objects) and Relationships (linkages)
Meeting Cancelled next week
ATTENDEES: Wendy, Jeremy, Oliver, Larry, Dan S.
Membership drive:
- Maybe pick up someone from the codebook group
Members meeting:
- Lifecycle 3.3 out 3.4 purpose
- Emphasize value of face-to-face in getting work done
SB meeting:
- Added slide on Roadmap
Announcement for tech info:
XML, RDF/OWL2, JSON Schema, UML, and others listed on the page XMI 2.2 and 2., graphic for sgv and
Take a look at what is there
Is it something you can implement
Does it follow how are other bindings are represented
comments on the generation of a binding in COGS repository (https://github.com/ddialliance/ddimodel/issues)
others in terms of content in DDI repository link to DDILIFE issue creation
we will trust their good sense
Comment on model work presented last call:
Like the overview charts that we showed on the DDI lifecycle - charts for visualization
Ones for content area should also be restructured based on usage - subset of items for specific usage
Study/repeated study, question banks, 4 main levels of content that link together
Data files, instruments
These were a the basic starter content providing a basic framework - getting started on different common sets of metadata for specific purposes
ATTENDEES: Wendy, Jon, Flavio, Dan S.
UPDATES:
Codebook:
sent email announcement and in the first few days have received 6 responses. Will be sending this to specific people identified earlier in discussions and then set up a work process.
Bindings:
Will send out call to look at bindings to DDI user and IASSIST list
Jon will provide some text
Membership:
Waiting on 3 responses
Will draft memo to Alliance members in preparation for meeting
Members meeting:
Presentation agenda determined
Writing annual report
Preparing slides
Announcement
Version 6 of Colectica will go out next week supporting 3.3
Roadmap work: DDI Suite of Products - overall model
(Quick notes)
Reviewed work to date from Wendy and Flavio:
Lifecycle model
Packaging was not as helpful
But information objects and relationships are clearer especially in terms of upper model and how information held in one place is used
How do we represent commonalities and differences between multiple models in terms of coverage and translasion (gaps etc)
Metadata Roadmap - StatsCan (GSIM)
Parts they care about color coded to types of information
Helps identify things we don't have (example: Population)
Added some things that are not purely GSIM in white but things they are interested in
Differences in modeling like the logical and physical which effects use of content
Data Platform
Visualize what data/metadata is used in what way
What is managed
What is searched
What is used for relational information
Need for transformation
Where different products may facilitate use or interaction with users at different levels
Technology independence - next 5 years of work
Metadata repositories
Lots of work with FAIR practices in terms of metadata
All are types of data objects
green persistent and unique IDs and links - other structural means of similar capabilities (data description)
From the point of view of what we do with this content and how it is organized
There are things there that are more refined than GSIM but that are needed to make the system work
Shows the broad range of vocabularies needed
Methods and formats used, frameworks used
Step back and provide a requirements view
From CDI it may be useful as their focus is linking to other standards
Provides a bit of a reality check
Clarifies what languages really need to work together
Mapping between DDI Lifecycle and GSIM
Perspective differences
High level view of DDI Suite
What do we do with the other standards - in terms of what
In general what product covers what
Definitions of what applications are intended for each product - verbal
Structured description of Suite and each product in terms of specific coverage, depth (description, machine actionable), application
Use to identify gaps between DDI Suite and other standards and what needs to be added
Verify version of GSIM used https://statswiki.unece.org/display/clickablegsim/Base+Group
Review old work on GSIM
How to integrate new work - into a new product, interaction with existing products, integration with existing product
TO DO:
Write up workplan and how to move forward on these different areas and uses of this work
Difference between codebook and lifecycle is more than just content - technical overhead and application (identification versioning etc.)
Agenda for next week:
SB and members meeting
Annual report
Presentation (slide decks) - review and edit
ATTENDEES: Wendy, Jon, Dan S., Flavio, Oliver, Larry, Jeremy
AGENDA:
Wendy sent out email to all members asking interest in continuing to serve and area of focus. Due back by end of month when search for new members will take place. This allows us to see areas where we are weak in terms of focus areas. Continue to seek members who are willing to be active in specific focus areas and retain interest and oversight with the broader work. Attendance at meetings is driven by agendas
Specific tasks for the next 3-4 months identified
3.3 high level documentation - continuing work task (Jon, Wendy)
monitor DDI-CDI - follow up on approaches (Larry, Wendy) - link to webinar interest spreadsheet https://docs.google.com/spreadsheets/d/1hQm9OTuQmRAvBC64HWgNk_cD_rttC-PHhxlOgmj8Tuc/edit#gid=0
-- email of 23nd from Achim
Codebook : send out announcement information; contact specific members/external contact; discussion part of members/sb meeting (Wendy); identify a group people set up tasks
--give a month with reminders at the members/sb meeting (Wendy)
--set up remaining tasks (Wendy)
--will need to continue to extend timeline (timing is less critical than coverage)
Roadmap work
--get some of the models together - pulling something together for members meetings (Flavio Wendy)
--preparation for mapping work; indentify interested people to work on this outside of TC
--Defining products with the suite (Wendy, Barry, CDI, CV, XKOS, SDTL)
Lifecycle 3.4
--**someone rights down what we think the outcome of the review should look like - how do we know its done and correct (Jon)
--content carry over - is the XML equivilent - expectaion of what we should get and compare to output (Oliver, Jon)
----what do we mean by COGS - the program rips through what we have using assumption files and dumps out the bits and bobs - what could we do to check it?
----two programs - schema extraction tool functions correctly - not loosing anything and putting the right thing in the right slot - were all the weird things caught and were they caught correctly
----other program deals with output
--XML and UML output maybe JSON - Oliver
----put out a call on list for people to look at bindings for views and comments - get out on users list (Jon)
----each output ideosymatic for those languages
----does the style of the output make sense for the binding - does it look right to users in terms of specific serialization format
----if there are multiple styles - what style are we going with and why
3.3 high level documentation - continuing work task (Jon, Wendy)
Wendy will organize identified tasks and use as basis of future agendas
Annual Report:
What we did
What we plan to work on
Highlight - major version FINALLY got out
significant expansion of the standard into areas where it has been week
opened up existing parts of the standard to broader application
better relationship to work of GSIM
work done around survey methodology - expanding out into an area where other standards are scarce
Presentation part:
DDI Suite approach - implications
Work on Codebook
Technical input
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Larry, Oliver, Flavio
Update on publication and review status
--Lifecycle waiting on posting of DDI page for 3.3 (today)
--CDI going out to day for review through 31 July
Update on changes to DDI Alliance web pages
--Changes under About done
--Changes planned under Standards
Update on SB working group
Upcoming agenda items - revisit plan and put some concrete tasks in
--Codebook update
--Improvements to the registry - HTTP solution
--Sanity checking 3.3 in COGS in terms of matching content
--Documentation work for 3.3 continue and eventual generation of updated 3.2
--Check status of broad conceptual coverage with Flavio
ATTENDEES: Wendy, Jon, Dan S.
Work Products
Looked at Work Products page:
Should be published work products
Talking about what is coming is not productive for the products page as it causes people to wait and has caused people to wait
- Separate page for development work
Cover disco, cdi and sdtl - What it does and what it covers
- Recommending to Jarad that the page include only currently published work products and have a separate page covering products under development but they should focus on what the product does now not a lot hand waving
- So DISCO, CDI, SDTL would go on the development page
PHDD - what should happen to this? - Clean up vocabularies - remove RDF Vocabularies
Under drop down box Specifications:
DDI Codebook
DDI Lifecycle
XKOS
Controlled Vocabularies
DDI Agent Registry
Under About:
Work Products of the Alliance -
Describe the DDI Suite description
Short statement and link to product page for published products
New entry for Developing Products
Include:
DISCO
CDI
SDTL
Lifecycle 3.3 publication
- April 15, 2020 is official publication date
- Jon will update dates and send to Dan S. for sanity check
- Jon will provide Michael with package and information on 3.3 to post
Wendy will pull together the announcement information and send to Jared
ATTENDEES: Wendy, Jon, Larry, Dan S., Jeremy, Oliver Agenda Defining "Technical Contact" Related pages in Wikipedia |
ATTENDEES: Wendy, Dan S., Jeremy, Larry, Oliver, Flavio
DDI-CDI:
- Reviewed remaining documents for clarity and clear instructions
- Specific changes will be relayed to Achim and Arofan
- For details regarding pdf diagrams in Model / Specific Documents issue we ask them to consult Flavio and Larry
- Wendy will send all but detailed example corrections today (example corrections to be sent Friday)
- Once corrections are made and repackaged the content will be ready for review period
- Update to the Moving Forward Project/DDI4 information on ddialliance.org must be completed prior to review announcement
DDI-L 3.3 vote
- 13 votes as of last weekend...Jared has sent out reminder
DDI-C
- Taina has been providing substantial input on CESSDA needs as well as usage requirements for existing issues
- Will be ready next week to begin requesting broader input
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry, Oliver
Agenda:
DDI-CDI review update - final package is complete and being uploaded
- Will notify TC with link to package and TC review page when upload is completed
- OK goal getting it out by April 1
Codebook review approach - setting up content for an announcement
- went through content for informational pages - will set up pages and send links for comments or edits
Other topics
Membership - contact current members about continued interest and solicit new members
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Oliver, Flavio
DDI Lifecycle 3.3 Vote:
- Documentation has been transferred to
https://github.com/ddialliance/DDI-Lifecycle-Technical-Guide - HTML view is at https://ddi-lifecycle-technical-guide.readthedocs.io/en/latest/#
- Its getting there slowly - mostly the work is on the specific structures section ..
- Mostly content stuff - images will be added more content over the weekend
- Inform Jared this is ready to go out for vote
DDI-CDI
- Should get the final package today.
- CDI is seeking specific review audiences and are writing their own focused invitation to review emails.
- TC should address the review notice to the general DDI audience along the line of XKOS review request.
The DDI Alliance (http://www.ddialliance.org/) is pleased to announce the public release of XKOS version 1.2, a free and open specification that facilitates sharing and management of statistical classifications.
XKOS is a simple and elegant model to represent complex classifications via their structural and textual properties, as well as the relations between classifications. By leveraging the widely-used Simple Knowledge Organization System (SKOS), XKOS extensions allow linkages to a wealth of existing thesauri and taxonomies to better support statistical classifications and concept management systems. XKOS also refines SKOS semantic properties to allow the use of more specific relations between statistical concepts, classifications, or any other case where SKOS is employed.
XKOS is currently used at a number of national statistical agencies and has been available as a draft since 2013. Since undergoing a final review in 2017, all issues have been addressed and corrected in preparation for the publication of version 1.2.
Links to the specification and instructions for comment are found at:
http://www.ddialliance.org/Specification/RDF/XKOS
NOTE: The above was the public release notice. The following is the text of the public review:
The DDI Alliance is pleased to announce the Public Review of XKOS, an RDF Vocabulary which extends the Simple Knowledge Organization System (SKOS) for the needs of statistical classifications. It does so in two main directions. First, it defines a number of terms that enable the representation of statistical classifications with their structure and textual properties, as well as the relations between classifications. Second, it refines SKOS semantic properties to allow the use of more specific relations between concepts. Those specific relations can be used for the representation of classifications or for any other case where SKOS is employed. XKOS adds the extensions that are desirable to meet the requirements of the statistical community.
Links to the specification and instructions for comment are found at http://www.ddialliance.org/Specification/RDF/XKOS. We are eager to obtain feedback from the DDI and RDF communities on this vocabulary. The comment period is open until January 31, 2017, and we hope to hear from you.
- RE: email for CDI review; add in a statement about reading the overview on the comments and starting with the linked documents in order of appearance in the overview document / specification documents folder.
DDI Codebook
- merge request DDICODE-43 -done
- proposed solutions have been provided for DDICODE-49, 51, and 59 - went through no additional comments
- Documentation has been added for DDICODE-38, 44, 36, 56 - does more need to be done other than entry work? - no
ATTENDEES: Wendy, Jon, Flavio, Larry, Oliver, Dan S., Jeremy
Lifecycle 3.3 vote
Have something by next Thursday - fighting with Sphinx
Technical guide done and User Guide will be good enough to go out for vote
--Statistical Classification
--Methodology
email reviewed
CDI Review
TC review of CDI should address:
--things are there
--things are working
--clear where to start or approach
Things to finish before face-to-face meeting
Review of binding output
--throughts gathered on each binding
----RDF approach has been back and forth
--UML purpose - start gathering ideas
Mapping not so much but would be nice
--how are things produced where we work Lifecycle and CDI take different approaches - what are the long term plans
--tools perspective of different products (PIM, PSM, what's doable)
--moving between different products (bindings and products)
--can we begin, work on, a road map
--going back and pull past discussions on several major topics
--resolve some of the congititive dissenence between products (DDI and others) - what are the implications for bindings
--purpose of each product - what technology/activity does it support - can we start profiling this
----preservation, maintenance, actionability, information transfer
----domain knowledge needed therefore possible domain specificity
Jon can start looking at what Lifecycle 3.4 is or could be in early April
NEXT WEEK AGENDA:
--last minute Lifecycle 3.3
ATTENDEES: Wendy, Dan S., Flavio, Oliver
AGENDA:
- Face-to-Face meeting proposal finalization
- Prep work for DDI-CDI review - what is our check list?
- starting preparation for Codebook work
Summarized status of mapping
Flavio has had a chance to start looking at the material sent by Wendy and will be looking at GSIM to DDI 3.3 mapping within context of Statistics Canada. One issue is how to represent different levels of mapping (conceptual, information objects, etc.)
Try to get this moving in march do to April time constraint for wt and fr. Then we can start looking for broader input
Face-to-Face meeting proposal
- Send out draft for final comment
- Use of COGS-CDI and role of UML in COGS should start sooner
- There are a number of issues that can be scoped out earlier - these should be identified and scheduled
Jon not available in September
Oliver one week not available (21-25 Sept)
MRT is planning to meet in first week of October with a cross discipline meeting the next week
CDI -
- Set up page (draft done)
- Set up new issue tracker (done)
- It will be a single download package
- Instructions within the package
- Instructions will need to go on the page
- How to approach review
- Check that links in documentation work
- All parts are there
- Instructions are clear - check these for understanding and functionality
- Send out when we receive with list of things that need to be checked off and get started reviewing
Contact Arofan to see if there are background documents that are ready and available that would be useful to list similar to earlier reviews (done)
Codebook
Have made progress on specific issues and am preparing a statement for soliciting input from Codebook users. Get moving in the next few weeks.
ATTENDEES: Wendy, Jon, Jeremy, Larry
Codebook:
- Merged and closed items In Review
- 5749 spelling Purpose
- Decision to enter fixed Resolved, notify members to review and then merge outside of meeting
Lifecycle:
- Closed single issue filed
- Added details to voting and publication actions
Face-to-Face meeting:
Pick up where we left on the face-to-face meeting
--is there a need to meet around codebook
--planning around Lifecycle 3.4 - outcome of evaluations around serialization, logistics of moving away from XML centric technically and culturaly (see roadmap), suite of products
--In terms of timeline as we were thinking last year...Septemberish?
Financing: similar to last year - justify the higher amount by need to get the right people in the meeting
Timing: September planning gives us a deadline for background information aquisition (post-Labor Day) - need to check about space availability
Location: Minneapolis
Topic:
- Quality time around 3.4
- Vision - how do we get out of being just an XML Serialization; what does this means in terms of marketing
- Bindings - RDF, JSON, UML
- How do we fit into the broader discussion with DCAT, PROV-O, ... whats the best way of packaging these up
- Implications for production, maintenance, tooling, training
- How can we leverage COGS across products
- Can we use COGS for Codebook? for which standards in terms of production [some of the long term Codebook issues in terms of flexibility need to be discussed first]
- Compatibility of COGS and CDI in terms of conflict between UML and Registry system
Inclusion:
Wendy
Jeremy
Dan S.
Jon - not available in September
Oliver
[Larry, Flavio, Jay, Dan G.]
Johan
Send out a general interest request for participation with a clear agenda, background information, etc. (specific names mentioned included)
Achim
Ben (ISRDI)
Ornulf
Benjamin Zapilko
Need to get discussion of the future structure of codebook going sooner rather than later. Identify who is using it and what they use in terms of specific objects and sub-features (mixed content with specific fields like concept) are being used.
Contacting users regarding Codebook:
MTNA use
Cornell
World Bank
Eurosat
Canadian Libraries
NHGIS - NCubes
SciPo
DataVerse
Pascal
SND - other CESSDA organizations
ICPSR list
ATTENDEES: Wendy, Jon, Dan S., Jeremy, Larry, Oliver
Lifecycle
Documentation - checklist for vote
- High level documentation - data collection, description, New item documentation - methodology, GSIM, classification, sampling, weighting, development processes
- Examples
- Focus on the technical documentation
Vote announcement:
- Are all the parts there - schema, adequate documentation (field, high)
- Change log describes the new content and means of transformation
- Has technical review work been done
- First 2 sentences should clearly summarize what the thing is - what Lifecycle 3.3 is and what has been added/why
- What Lifecycle in general is - most people know this - this is more important when introducing new products
- Date for vote - less than a month from end of review (by March 20)
Modeling work: -
- general high level model is the priority
- Wendy and Flavio will keep updated - basic work is progressing and then we need to broaden the group working on this
Codebook:
- See filter Codebook_outstanding
- Items reviewed and labels added
NEXT WEEK AGENDA:
Pick up where we left on the face-to-face meeting
--is there a need to meet around codebook
--planning around Lifecycle 3.4 - outcome of evaluations around serialization, logistics of moving away from XML centric technically and culturaly (see roadmap), suite of products
--In terms of timeline as we were thinking last year...Septemberish?
ATTENDEES: Wendy, Jon, Jeremy, Dan S., Oliver, Larry
Apologies: Flavio
TC-213 Workplan - General
- Review need for addition to task detail, timing, priority
- Identify who is interested in which areas of activity
- We will need to pull in outside people in many areas to get the knowledge base for development and review
- We should review membership this year, verify that listed members are still interested in TC and posting request for new members
- Send out a reminder to comment particularly on documentation changes
Distribution of member effort at least in short term (through June)
Lifecycle 3.3
broad group project not a lot to do specifically
CDI
broad group project not a lot specifically primarily keeping in touch and following-up where asked to by MRT in terms of iterations, reviews, and publication issues
Codebook
Codebook...is it a full 2.6 release or a 2.5.x what should it encomass
Jon - interested but not a lead role - docuemntation
Jeremy - no
Dan - participate in technical review
Larry - pretty booked through the spring
Oliver - not that much not that involved in Codebook - maybe ID some GESIS people
Wendy - yes; will lead on this and layout options for broader involvement and input
Flavio - ? probably not
Roadmap work - note that this is not a TC activity alone but involves Marketing and the broader Scientific Board community
Need to pull existing resources together - what are the output going to be and how will they be useful
Jon - helping out
Dan - no not really
Jeremy - no
Larry - don't know
Oliver - later on
Flavio - yes, conceptual model and mapping
Wendy - yes
3.4
Jon - yes on content carried over - there is software to do this results needs to be reviewed
Jeremy - yes
Dan - yes but need outside eyes to review as he wrote transform; make sure substitution stuff works; no will do binding review but will go through reports address and fix
Oliver - XML binding review and will ask people to look at RDF binding
Flavio - UML binding what is the intent of the UML in terms of a product (current was intended only for documentation purposes - diagram creation, publishign for people to look at)
Wendy - XML and UML
3.2 high documentation
Jon and Wendy primarily
ATTENDEES: Wendy, Dan S., Jeremy, Larry, Oliver, Flavio
APOLOGIES: Jon
AGENDA:
DDILIFE-3681 resolved; Dan S will enter correction
Work plan for 2020
DDI Core review - provideded update
- Use DMT issue tracker with label to enable filter
- Wait to see whats in their package
DDI Codebook
- Model is going to be a long term issue
- Identify users of Codebook usage - who needs to support it
- Development of European Question Bank
- Areas of conversion between D ad L -
- Short and long term directions
- Large meetings (structure and offer)
- First half 2020
- Set up issue TC-213 to track development of the 2020 work plan - added new label "Workplan-2020"
- defining the DDI coverage area - how
- Broad definition of the "area" of DDI - what do the products cover
- Focus on moving information from one to the other
- What does integration mean
- Start now to define and fill in content in the 2nd half of the year
- Flavio and Wendy will pull something together in the sort term to outline work and identify internal and external people to involve
DDI 3.4 would be just a serialization of the COGS system
-- TC should look at binding structure and organization following publication of
-- JSON and XML have been most tested
-- RDF should be reviewed regardining style and content
-- UML determine the purpose and functionality of EA style UML - what can make it more useful -should it be canonical
-- what type of model do we want to work with as an implementation model
-- how does it relate to the MF approach of PIM and PSM
-- are there things that are needed in terms of directionality
-- need additional outside people who work with these approaches
DDI 3.2 high level documentation
-- post 3.3 publication issue 3.3 is the priority
-- what is too much effort, its not a very high priority
-- continually review priority and payoff for this work
-- maybe just low-hanging fruit
Present: Jeremy, Flavio, Dan, Oliver, Larry, Jon
Release package: schema, field level docs, license and readme
Other items are referenced on Review page
DDI Codebook – need an email out to community asking for input into requirements (over and above existing JIRA’s). Possibly approach known users e.g. J Gager / World Bank, ICPSR, CESSDA as experts.
Technical Committee F2F – this is probably worthwhile, need clear rationale and work for justify expenses. Is this is the budget for 2020/21?
ATTENDEES: Wendy, Dan S., Larry, Oliver, Jeremy
TO DO List:
--enter updated examples
--verify that pretty print schemas have been used
3.2 documentation has also been updated and is available - announce the model documentation when we release updated 3.2 documentation
Do we want a note that additional examples completed after publication may be found in the High Level docs (link to)
Quick review of Schema docs looks good (Larry, Jeremy)
Should the set of examples be outside the package? would need change the reference to the schemas in each. REMOVE the schema reference completely because most systems use locally assigned. Usage is to see how something is done.
concensus is this is a good idea
Assuming that an official example is not a full instance this will
Move up FieldLevelDocumentation Folder on level with XMLSchema folder (only thing now left in Documentation folder)
3.2 review email
http://lists.icpsr.umich.edu/pipermail/ddi-users/2013-November/000802.html
Add new content from last 3.3 public review page
Decided on review period Jan 24-Feb 21 (4wks)
Package will contain License, Change Log, Readme, XML Schema folder, Field Level Documentation Folder
Links on review page: Model Documentations, High Level Documentation, Examples
Note that these links should be moved to the DDI website with publication
ACTIONS:
Finish review page set up
Finish draft of information for announcement
Notify Barry and Jared following discussion with Jon on Friday (1/17)
ATTENDING: Wendy, Jon, Larry, Oliver, Dan S.
Current status of documents
https://ddi-lifecycle-documentation.readthedocs.io/en/latest/index.html#
Main issues:
New 3.3 material
How to use new documentation because DocFlex is a problem and needs special production each time
Not straightforward to reference into or within it
Can't reference from the technical documents to the DocFlex so we need to point to the model
Need location for the committed set - Dan will rerun to create the latest COGS version which produces a model
The issue with the model documentation is that it will have the type but not the elements - not XML documentation but structure documentation
Need to clearly define what is where (which document type to use)
One is the XML schema documentation and other is structure documentation
Fragment seems to be missing in model documentation as the containers are auto-generated so we need to have documentation produced for auto-generated structures (Dan will fix)
FIHR has mapping between their elements and ISO/IEC 11179 we need this for use of Dublin Core and XHTML
Proposing we link to the DDI model using a set of topics as the entry points
Specific topic listed in above
Jon is currently moving things around and so all should look at and provide comments to Jon regarding improvements
Next Thursday should be the target
TODO: wendy set up web site
High level docs are under development
Dan will rerun latest documentation based on schema changes
Draft announcement - Wendy
Hold on posting minutes until we have new link from Dan for commenting
Reviewed DDILIFE-3662 added Lifecycle to title (merged) and closed issue
Agenda Topic Index Key
2018-2019 Minutes Page
2016-2017 Minutes Page
Pre-2016 Minutes Page
AGENDA TOPIC | DATE |
---|---|
Resolution System - work done by TC | 20211111 20210923 20210916 20210909 20210902 20210819 20210722 |
Agenda for TC face-to-face | 20211209 20200227 20200220 |
Mapping of products | 20211111 |
Controlled Vocabulary Publication | 20210701 |
DDI Site Pages | 20200304 20201217 20201210 20201203 20201112 20200924 20200917 20200827 20200813 |
Lifecycle 3.3 | 20200409 20200312 20200305 20200220 20200213 20200123 20200116 20200109 |
Codebook update | 20211104 20211028 20211021 20210408 20210401 20210225 20210128 20200312 20200220 20200213 20200206 |
Roadmap - COGS | 20200723 20200716 20200625 20200206 |
CDI review | 20200409 20200326 |
SDTL Review | 20201119 20201029 20200903 20200618 20200528 |
2020 Workplan | 20200305 20200206 20200130 |
URN filing | 20201112 20200716 20200709 20200702 |