Data process produces new data from existing data
|Actors||Data Process Programmer|
|Uses||UC.R2.47 Define Executable Process
UC.R2.48 Schedule Process for Execution
|Is Used By||UC.R2.18 Visualize Data Product|
|Is Extended By||UC.R2.03 Produce Real-Time Calibrated Data|
|In Acceptance Scenarios||AS.R2.02C Instrument Life Cycle Support, AS.R2.04A Data Product Leads Drive Core Data Product Creation, AS.R2.03A Modelers Integrate External Model with OOI|
|Technical Notes||Transformations result in new data; existing data are never deleted.|
|Primary Service||Data Processing Services|
|UC Status||Mapped + Ready|
This information summarizes the Use Case functionality.
A data process definition exists that describes an algorithm to transform input data into output data. This is executed as a data process in an execution engine. The data process is executed based on defined triggers (time, event, user request, or other condition), thereby effecting a Workflow. A data process can be chained to a data stream (i.e., is executed as a consequence of the existence of a new message) or independent of data streams (started at certain intervals or times, or when a user requests it).
- The workflow in R2 is manually composed by the developer, not coordinated by a Workflow engine.
- In R2, an Integrated Observatory Operator may serve as the Data Process Programmer.
- A data transform process definition is defined in UC.R2.47 Define Executable Process
- Each instrument in OOI can give rise to data products at multiple data levels: raw, Level 0 (Unprocessed Data), Level 1 (Basic Data), and Level 2 (Derived Data).
- Each data transform is atomic by nature. Composition is performed by (manually) chaining data transforms.
- There are several parts to the data transformation process: the data process definition, the executable data transformation process that results from instantiating a data process definition, and the engine that executes the data process definition.
- Data process definitions are resources, and can be annotated, discovered, versioned, etc. like any other resource.
- ION governance precludes the execution of malicious or harmful code by the data process services, for example using all ION compute resources.
Data Process Programmer or sufficiently privileged user is ready to enter data transform definition.
- Data Process Programmer (DPP) creates a data process definition specifying a transformation.
- See UC.R2.47 Define Executable Process
- Includes defining data process definition source and destination data characteristics.
- Some specification of input constraints may be necessary.
- DPP selects data process definition: The system can help the user to find an appropriate data process definition, e.g. by starting from a source data product and displaying all available data process definitions.
- DPP defines the actual data transformation (execution process) to take place
- Extends UC.R2.48 Schedule Process for Execution
- This can include a reference to specific input and output data products.
- If multiple input products are required, a transform process to collapse these into a single stream in R2 will simplify implementation.
- A set of 'standard outputs' that can be used should include error messages/outputs.
- The DPP configures the data process definition execution frequency.
- The DPP defines the specific data product (including metadata) which will result.
- The data process definition has sufficient information to make it possible to register the resulting data set (which remains empty until the data process definition execution is enabled).
- The definition includes the name and data structure, and other metadata attributes, of the resulting data set.
- The system registers stream and data set resource definitions for the output, with metadata based on the provided attributes.
- The Data Product could be created this step or the next.
- The DPP activates the data transformation process
- The system enables execution of the transformation based on the data process definition
- Internally, this means that a process will be prepared for execution and started, where it will wait for event arrival.
- The appropriate execution trigger (time, data, or other event) causes the data process to be executed, creating an update to the resulting data product.
- As with all other executing processes, the Integrated Observatory tracks process execution (frequency, CPU time spent, elapsed time, and so on). This information can be inspected (not a part of this use case).
- Real-time metadata for the produced resources are updated automatically by the data process.
- The transformation process issues a message (in addition to publishing resulting data) to announce each execution's completion.
- This is a system monitoring function (performed automatically or manually, but external to the script).
- If execution is not successful, the Integrated Observatory system should indicate how to get help.
- Data transformation process continues operating until deactivated.
The data transform definition is registered, is executing successfully according to the prescribed schedule, and is producing the desired products.
These comments provide additional context (usually quite technical) for editors of the use case.
DM infrastructure is essential to carry out the transformation.
The tuning of workflow operations is not considered in Release 2.
A transformation chain can be tested on a subset of data by making the first step of the chain select only a subset of data. When the chain should operate on a complete data set, modify the first data process definition and re-start.
Any individual data process definition can perform computations of arbitrary complexity, and can choose to not generate a product, which effectively terminates any workflow processing chain.
The user-visible behavior is a small subset of the steps provided in the scenario.
It is ambiguous whether to treat a data process that generates data, without requiring an input, as a data transform. Allowing that possibility as a degenerate case seems reasonable, but there may be reasons to consider it separately and call it "Generate Data" rather than Transform Data.
For visualization workflows, the following comments are provided:
In R3, visualization transformations will occur within an extensible framework for the integration of data visualization tools (specification of standard formats, intermediate processing steps and a configurable pipeline (workflow) of generating a visualization from a dataset) and workflow framework for visualization workflows (supports developer created workflows for the visualization of data products based on measurement data, using graphics data to visualization applications).
OOI will develop some specialized visualization transformations, e.g., for google earth and Matlab, though not necessarily in Release 2.