Skip to end of metadata
Go to start of metadata

Overview of "Integrate External Dataset" Use Case

*Make a collection of external data continuously available as OOI data.*

Tip: Key Points
UC Priority= 4 or 5: Critical, is in R2
Only boldface steps are required
<#> before a step —> lower priority
(optional) —> run-time option

Related Jira Issues:   [Open|,+status+DESC]   •   [All|,+priority+DESC,+status+DESC]


Refer to the Product Description and Product Description Release 2 pages for metadata definitions.

Actors Data Registrant (a Registered User)
References EOI R2 Data Handler Resources & Objects
Uses UC.R2.01 Acquire Data From External Dataset
Is Used By  
Is Extended By  
In Acceptance Scenarios AS.R2.02B Data Support via Cruise, AS.R2.03A Modelers Integrate External Model with OOI, AS.R2.04A Data Product Leads Drive Core Data Product Creation
Technical Notes This use case extends the referenced R1 Hello Data Source use case. Definitions:
  • external data provider: organization or person offering the data set
  • data source: system (implicitly, service) providing the data set
  • external dataset: connection through which particular information collection is offered, and description of that information collection
  • external dataset content: actual bits that make up the information collection (analogous to our internal data set content)
  • external dataset connection: the endpoint through which a particular information collection is offered
    This use case targets an external dataset that includes with supplements (updates) over time.
Lead Team EOI
Primary Service External Data Access Services
Version 3.2
UC Priority 4
UC Status Mapped + Ready
UX Exposure EUC


This information summarizes the Use Case functionality.

An external dataset is registered by an authorized Data Registrant, who represents the External Data Provider. The registrant provides at least the minimum metadata required by the Integrated Observatory system. The metadata specifies the data source, including the system and service providing the data and the protocol that it uses; information about the person or organization providing the data, and information about the dataset content. The Integrated Observatory system automatically connects to the data source, obtaining additional metadata and operational and status information. Depending on the results, the system may request additional metadata from the initial registrant, forward the information to an operator for approval (and possibly additional metadata), automatically begin acquiring data from the registered data source, or generate an error. The Integrated Observatory keeps track of provenance and acquires the data, performing automatic fetches if necessary, and integrates acquired data with any previously obtained data in the dataset.


  • The provider protocol, data format, and data "class" (see Comments section) is supported by an Integrated Observatory dataset agent, which exists and is installed in a capability container.
  • The criteria for the minimum set of metadata to establish the necessary provenance information has been defined (in consultation with the Data Curator).
  • Data from external data sources are only cached for a period of time, not kept indefinitely.
  • If multiple datasets are made available by a data provider, each must be entered individually, or the data may be entered in scripted way. GUI/process efficiencies are welcome, but may not be offered in R2.)
  • The descriptive metadata required from the registrant depends on the protocol. (In some cases the system can leverage metadata provided via that protocol.)
  • Artifacts and attributes that result from the acquisition of a dataset are presented in association with the corresponding Integrated Observatory dataset. For example, an external data provider will have a responsible contact (a user identity) and/or organization information.
  • Users may register external datasets that are outside their authority, but review will take place (likely out-of-band) to ensure publishing that data is appropriate.

Initial State

A Data Registrant — an OOI science user, or similarly authorized person — wants to register an external dataset for acquisition by the Integrated Observatory.

Scenario for "Integrate External Dataset" Use Case

  1. Data Registrant prepares by determining any specific information required to register the external dataset.
    1. The Integrated Observatory should provide some guidance as to what information is needed (without having to start a registration process).
    2. Mechanisms for search and selection of such external datasets are out of the scope of this use case. It is assumed the user knows or can determine the URL and other descriptive information for the desired dataset.
  2. Data Registrant registers the external dataset and corresponding data source and external data provider, with the system.
    1. The dataset, data source, and external data provider can be found in the Integrated Observatory by any interested user.
    2. The registration may be performed via a user interface, or via scripted data entry (with help of an Integrated Observatory Operator).
    3. One external data provider can have multiple data sources, each of which represents a single service type. (Thus the external data provider may already be registered in the system, and should be selectable if so.)
    4. One data source can proved multiple datasets, each of which is registered separately. (Thus the data source may already be registered in the system, and should be selectable if so.)
    5. At this time, various internal preparation steps may take place (creating new data topic, registering dataset agent instance to publish to it, registering all the associated metadata in the appropriate objects or resources).
  3. <3> The Integrated Observatory connects to the data source, typically to the external dataset connection, to confirm its viability.
    1. Usually the agent tries to connect to the external dataset and access the external data set content. The connection/test may also be performed at the data source level for certain services.
    2. This may be performed immediately upon submission of the initial registration, and if so may result in the acquisition of additional metadata (saving the Data Registrant the trouble of entering it).
    3. If any required metadata is not provided at this point, the Data Registrant is prompted (via UI or email) to complete the process.
  4. Dataset Agent initiates full data acquisition activities.
    1. See UC.R2.01 Acquire Data From External Dataset.
    2. Includes transforms and data publication as specified therein.
  5. Resulting data products are discoverable by users.
    1. Integrated Observatory data products that come from External Datasets are tagged as such, so users can specifically find or avoid all data in that category.
  6. <3> Changes to the external dataset connection may be made by the Data Registrant or Integrated Observatory Operator.
    1. For example, if a different data source begins supplying the same data, such a configuration change reconnects the data stream.
    2. If the external dataset content changes, this constitutes a different dataset, and the Integrated Observatory Operator must make the change.
    3. Changes may require resubmission via script, rather than changes through a user interface.

Final State

Data source is registered and providing data to the Integrated Observatory.


These comments provide additional context (usually quite technical) for editors of the use case.

The system must support the "class" of dataset the user wishes to register. This could be accomplished by:

  1. The user converting their format to conform with one of the supported classes
  2. An experienced and credentialed developer generating DatasetAgent to support the new form.
    Even for supported "classes" of data, there likely will be much information required for the DatasetAgent to properly interpret and process a given dataset. Often we will need explicit knowledge of variable mappings, metadata mappings (and additions), and topological/feature mapping. The amount of information is proportional to the complexity of the data (a time series dataset won't require as much information as a model-output file with multiple coordinate axes, such as ROMS) and also varies depending on the "class" (determined by DataSource) of the data (adding a new NetCDF will likely need more information than adding a new SOS dataset).

Uploading a single artifact (a data set file, a document) is sufficiently simple that this use case, and user interface, is not applicable. We have not written a custom user interface (or use case) for this situation. (In R2, we may not ingest these single artifacts as datasets, they may only be accepted as attachments, without conversion to internal formats.)

(click on # to go to R2 use case)
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61     27B


r2-usecase r2-usecase Delete
usecase usecase Delete
productdescription productdescription Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.