SDTL Review

Purpose of Review

Comments are requested on SDTL’s utility as:

an independent intermediate language for representing data transformation commands for data stored in common statistical analysis packages.
as a machine actionable description of variable level data transformation
a representation of the SDTL model in different formats (e.g. JSON, XML, OWL 2 RDF, UML)
a crosswalk (the Function Library) between a standard SDTL representation of each function and the implementations of that function in various statistical languages
a human readable translation of SDTL commands (the Pseudo-Code Library)

Scope

Comments are requested on the content and scope of SDTL

SDTL Command Language
Function Library
Pseudo-Code Library
Documentation

Starting point of review is through the SDTL User Guide

What is SDTL?

Structured Data Transformation Language (SDTL) is an independent intermediate language for representing data transformation commands. Statistical analysis packages (e.g., SPSS, Stata, SAS, and R) provide similar functionality, but each one has its own language and syntax. SDTL consists of JSON schemas for common operations, such as RECODE, MERGE FILES, and VARIABLE LABELS. SDTL provides machine-actionable descriptions of variable-level data transformation histories derived from any data transformation language. Provenance metadata represented in SDTL can be added to documentation in Data Documentation Initiative (DDI) and other metadata standards.

Role of SDTL in DDI

SDTL greatly enhances the value of DDI by providing a key component of an automated metadata production process. Currently, DDI metadata is almost always created by data repositories but not by data producers. Even when data are born digital, data producers discard provenance information that could be transported into DDI, because they perform data management and variable transformations in statistical packages that offer minimal metadata capabilities. SDTL and the tools created by the C2Metadata Project are designed to create a metadata life cycle that parallels the research data life cycle. The same scripts that are used to transform and manage variables and data files can be used to update metadata files. As a result, data producers can create more accurate and complete DDI metadata with less time and effort for them and for data repositories.

Areas of interest for review:

Use cases that test SDTL, especially examples that demonstrate the usefulness of SDTL to the DDI community
Exploring potential links between SDTL and other metadata formats and provenance initiatives