Dataset agents (aka Data agents, External dataset agents) act as data producers (see CIAD MI Dataset Agent Data Acquisition and Ingestion) in the OOINet system for OOI instruments and platforms providing data in electronic form. The play a similar role than instrument and platform agents, but lack the ability to command and control the actual device itself. Dataset agents can also be used in an external observatory integration (EOI) context, pulling data from external non-OOI data sources. In general, dataset agents can be applied in these 2 scenarios:
- Representing OOI instrumentation without direct connection or CI presence (e.g. gliders, subsurface moorings, disconnected instruments) for which data arrives in file form from the Marine IOs.
- Representing external data sources via the External Observatory Integration (EOI) effort, e.g. NOAA, IOOS, Neptune CA data sources
Dataset agent responsibilities include management, decomposition and packaging of data from the producer and communication of data producer events.
The figure below shows the various way in which instrument, platform and dataset agents can be deployed.
- Case 1 shows the case of real-time connected OOI instruments for which instrument agents exist. Instrument agents produce granules to be ingested into science coverages and presented as DataProducts to the user. Dataset agents are NOT used in this case. Note: Platform agents are integrated in a similar way, only that they produce engineering data instead of science data.
- Case 2 shows uncabled OOI instruments. Data is received from Marine IOs via a file exchange interface (using rsync or iRODS servers) in file formats defined in IDD documents. Dataset agents publish granules to be ingested into science coverages and presented as DataProducts to the user. Dataset agents exist for every device on the instrument, child node and platform (mooring) levels.
- Case 3 (UNUSED in Release 2) shows external data sources hosting one or multiple external data sets. Dataset agents fetch data in regular intervals from these data sources and granules to be presented as DataProducts to the user. These granules may not be ingested and live only transiently in OOINet for data consumers that require these data products. A data source agent may coordinate all dataset agents that fetch data from the same data source.
Figure 1. Dataset and Instrument Agents (OV-1)
The external dataset agent leverages a similar base class than the instrument agent. In fact, in Release 2, it extends the instrument agent base class, provides external dataset agent specialization and removes not relevant parts of the framework. The external dataset agent comes in 2 different flavor:
- Fully commandable external dataset agent. The driver is loaded based on the agent configuration and provides a specialized command set
- Simple external dataset agent. Provides plug-in ability for a poller (how to retrieve new data) and a parser (how to extract new records from new data)
See here for details:
The figure below shows how the dataset agent is initialized and maintains state. State includes the location to scan for data files, the most recent read file and the position within the file.
Figure 2. Dataset Agent State Management
The following mechanisms apply:
- Through preload or operator action, an ExternalDatasetAgentInstance resource object is created. It contains configuration for the future agent process, such as the directory and file pattern to scan for data files.
- The dataset agent is started by the user. The configuration from the ExternalDatasetAgentInstance resource object is provided to the new agent process on spawn.
- The UI makes a service call to instrument_management, which relays to data_acquisition_management service.
- The agent config builder class assembles a large configuration package for the agent process to spawn. The configuration is stored in the object store
- The newly spawned agent reads the agent configuration object from the object store
- Every time the dataset agent updates its internal state, it updates the persistent process state associated with the process id. See the container state repository.
- When the user stops the agent process, the most recent persistent agent process state is copied into the ExternalDatasetAgentInstance resource object as attribute saved_agent_state
- When a new agent process is stated, it receives the prior process' state as part of the spawn configuration.
- The new agent process has a different process id than the prior agent process
- The saved prior agent state is placed into the agent configuration object compiled by the agent config builder
- The newly spawned agent extracts the prior agent state from the spawn configuration and initializes itself accordingly
- The agent process subsequently creates and updates the persistent process state associated with the process id
- On failure of the agent process, the container it is running in or the entire system, the CEI infrastructure eventually restarts the agent process. On restart, the agent process reads the persistent process state and can resume where it left off.
- Note: the restarted agent process keeps the same process id as the process before it failed. It may reside in a different container but it maintains its old process identity