The Sensing and Acquisition subsystem provides data processing capabilities, through application of lower level Data Management capabilities. In particular, this enables the definition of data processes deriving information from less refined information, on a continuous streaming basis, for instance for the generation of derived science data products.
Figure 1 provides an illustration of the application of data processing using the DM subsystem capabilities of Data Streaming and Data Transformation. Here, data sources, such as physical instrument devices produce one or multiple data streams. For OOI instruments, ION persists all direct instrument output in its raw form, as sequences of bytes. In addition to that, the instrument specific agents/drivers parse the raw instrument output, such as measurement records and repackage them into OOI Common Data and Metadata Model, suitable for further processing.
All distinguishable outputs are separated into data streams. These data streams are known and created before the device is turned on. Once the device is sampling, individually packets or one or more sample records are published on these data streams in real-time, or as fast the communication medium (e.g. intermittent satellite connection) allows. Messages are routed via the DM PubSub services, based on the COI Exchange. Transform processes are connected to one or multiple input streams (via the Data Process Management Service) and compute output packets in real-time on defined output streams.
The entire graph of streams and transform processes realizes a data flow or real-time produced data updates.
Figure 1. Data Processing applied to generate a data flow (OV-1)
Information about the algorithm or script that processes the data is stored with the data process resource, not the data transform. Data process scripts are persisted in the ION system by an OOI developer during R2 development time (not run time). These processes may be a package format (like an egg or JAR file) or potentially source code. In the R2 release there are no user provided process definitions, and process definitions are created during development time, so validation of the scripts is simplified. A specific execution engine (e.g. Matlab) will need to verify that a process definition is valid.
Data Processes may leverage ancillary information such as a table for constants based on model type. This information can be attached to the process definition and updated at a later time if necessary. It is assumed that the format of the information will be consistent with the expectations of the algorithm.