Skip to end of metadata
Go to start of metadata

Overview of "Acquire Data From External Dataset" Use Case

Acquire external dataset content, ingest into Integrated Observatory catalogs, and publish as appropriate

Tip: Key Points
UC Priority= 4 or 5: Critical, is in R2
Only boldface steps are required
<#> before a step —> lower priority
(optional) —> run-time option

Related Jira Issues:   Open   •   All


Refer to the Product Description and Product Description Release 2 pages for metadata definitions.

Actors Data Registrant (a Registered User), Integrated Observatory Operator, Data Process Programmer
References UC.R1.03 Hello Data Source
Uses UC.R2.21 Transform Data in Workflow (conversion of external data into Integrated Observatory common model)
UC.R2.23 Ingest Data Stream Supplement (notification mechanisms for ingestion in Integrated Observatory)
Is Used By UC.R2.29 Integrate External Dataset
UC.R2.61 Reacquire External Data
Is Extended By  
In Acceptance Scenarios AS.R2.02B Data Support via Cruise, AS.R2.03A Modelers Integrate External Model with OOI
Technical Notes Definitions:
  • external data provider: organization or person offering the data set
  • data source: system (implicitly, service) providing the data set
  • external dataset: connection through which particular information collection is offered, and description of that information collection
  • external dataset content: actual bits that make up the information collection (analogous to our internal data set content)
  • external dataset connection: the endpoint through which a particular information collection is offered
Lead Team EOI
Primary Service  
Version 3.3
UC Priority 5
UC Status Mapped + Ready
UX Exposure EUC


The data from the external data source is read in to the Integrated Observatory, and ingested (transforming it into the observatory's common data model) if that is appropriate. The data is cached for a period, but not indefinitely; it must be re-requested if it is needed (UC.R2.61 Reacquire External Data) after it is no longer cached. The data that is acquired is made available to subscribers and other users (consistent with policy), at least until it passes out of the cache. The metadata is persisted for an extended period, possibly indefinitely, again per policy.


  • A dataset agent exists and is operational.
  • The minimum set of metadata has been provided to establish the necessary connections and define the provenance of the acquired data.
  • Data from external data sources are only cached for a period of time, not kept indefinitely.
  • In R2, if version information is provided for the external dataset, the Integrated Observatory system should track that information.
  • The Integrated Observatory Operator has the ability to change parameters and permissions for related resources as needed and appropriate. The data provider (authorized representative of the external dataset; may be the Integrated Observatory Operator) has the ability to change certain parameters and permissions.

Initial State

External dataset is registered and validated.

Scenario for "Acquire Data From External Dataset" Use Case

  1. Upon being notified of, or discovering via polling, new data from the external dataset, the Integrated Observatory system reads the additional data.
    1. See UC.R2.23 Ingest Data Supplement.
    2. The dataset agent manages a synoptic notion of time as needed, for example tracking the delta between time at the data source and the Integrated Observatory time (in case the data source time develops serious offsets, as sometimes happens).
    3. The specific technique for tracking version information depends on architectural implementation, and is not detailed in the use case; but versions should be indicated in the acquired data.
  2. <3> Dataset agent updates operational metadata for each dataset.
    1. Such as last update time, status/state of health, number and size of supplements received, and life cycle state.
  3. <3> Users can review the status of the external dataset and its acquisition of data.
    1. Using the metadata updated by the dataset agent.
  4. The received data is distributed in its raw form within the Integrated Observatory, and made available externally (per policy), along with its related metadata.
    1. A dataset that is not approved for release may not be externally visible for non-operators. (This is expected to be an exceptional state.)
    2. Each Integrated Observatory dataset receives a unique identifier that end users can use to find, get information about, and download or subscribe to the data set (assuming they have those permissions).
  5. The Integrated Observatory system parses incoming records and creates new records (in a separate dataset) in the Observatory's canonical data format.
    1. The agent/driver may collect multiple data records in one message
    2. Each data message is published to the ION exchange with appropriate metadata.
    3. The records are associated with the new dataset so as to enable a contiguous set of data records.
  6. If desired, Data Registrant specifies additional algorithms to convert acquired data into additional data products; a Data Process Programmer effects these transforms.
    1. Transformations can be added before or after data starts arriving.
    2. See UC.R2.21 Transform Data in Workflow.

Final State

Data from external dataset is received and distributed within system, and to end users as appropriate.


Several protocols will need to be supported by the end of Release 2; developing that list is a part of the release development process.

(click on # to go to R2 use case)
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61     27B


r2-usecase r2-usecase Delete
usecase usecase Delete
productdescription productdescription Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.