Acquire external dataset content, ingest into Integrated Observatory catalogs, and publish as appropriate
|Actors||Data Registrant (a Registered User), Integrated Observatory Operator, Data Process Programmer|
|References||UC.R1.03 Hello Data Source|
|Uses||UC.R2.21 Transform Data in Workflow (conversion of external data into Integrated Observatory common model)
UC.R2.23 Ingest Data Stream Supplement (notification mechanisms for ingestion in Integrated Observatory)
|Is Used By||UC.R2.29 Integrate External Dataset
UC.R2.61 Reacquire External Data
|Is Extended By|
|In Acceptance Scenarios||AS.R2.02B Data Support via Cruise, AS.R2.03A Modelers Integrate External Model with OOI|
|UC Status||Mapped + Ready|
The data from the external data source is read in to the Integrated Observatory, and ingested (transforming it into the observatory's common data model) if that is appropriate. The data is cached for a period, but not indefinitely; it must be re-requested if it is needed (UC.R2.61 Reacquire External Data) after it is no longer cached. The data that is acquired is made available to subscribers and other users (consistent with policy), at least until it passes out of the cache. The metadata is persisted for an extended period, possibly indefinitely, again per policy.
- A dataset agent exists and is operational.
- The minimum set of metadata has been provided to establish the necessary connections and define the provenance of the acquired data.
- Data from external data sources are only cached for a period of time, not kept indefinitely.
- In R2, if version information is provided for the external dataset, the Integrated Observatory system should track that information.
- The Integrated Observatory Operator has the ability to change parameters and permissions for related resources as needed and appropriate. The data provider (authorized representative of the external dataset; may be the Integrated Observatory Operator) has the ability to change certain parameters and permissions.
External dataset is registered and validated.
- Upon being notified of, or discovering via polling, new data from the external dataset, the Integrated Observatory system reads the additional data.
- See UC.R2.23 Ingest Data Supplement.
- The dataset agent manages a synoptic notion of time as needed, for example tracking the delta between time at the data source and the Integrated Observatory time (in case the data source time develops serious offsets, as sometimes happens).
- The specific technique for tracking version information depends on architectural implementation, and is not detailed in the use case; but versions should be indicated in the acquired data.
- <3> Dataset agent updates operational metadata for each dataset.
- Such as last update time, status/state of health, number and size of supplements received, and life cycle state.
- <3> Users can review the status of the external dataset and its acquisition of data.
- Using the metadata updated by the dataset agent.
- The received data is distributed in its raw form within the Integrated Observatory, and made available externally (per policy), along with its related metadata.
- A dataset that is not approved for release may not be externally visible for non-operators. (This is expected to be an exceptional state.)
- Each Integrated Observatory dataset receives a unique identifier that end users can use to find, get information about, and download or subscribe to the data set (assuming they have those permissions).
- The Integrated Observatory system parses incoming records and creates new records (in a separate dataset) in the Observatory's canonical data format.
- The agent/driver may collect multiple data records in one message
- Each data message is published to the ION exchange with appropriate metadata.
- The records are associated with the new dataset so as to enable a contiguous set of data records.
- If desired, Data Registrant specifies additional algorithms to convert acquired data into additional data products; a Data Process Programmer effects these transforms.
- Transformations can be added before or after data starts arriving.
- See UC.R2.21 Transform Data in Workflow.
Data from external dataset is received and distributed within system, and to end users as appropriate.
Several protocols will need to be supported by the end of Release 2; developing that list is a part of the release development process.