View Source

{info}This page describes how OOINet computes status for its sites and devices{info}

h2. Summary

There are 4 defined status categories for devices and sites within OOINet:

* *Power status*: Status, warnings and alerts related to device and device port power. E.g. power surges at the port an instrument is connected to
* *Comms status*: Status, warnings and alerts related to device connectivity and telemetry to remote devices. E.g. satellite communications distortions or delays and direct communications with the instrument via the port agent
* *Data status*: Status, warnings and alerts related to abnormalities with data acquired from devices. E.g. "stuck values", spikes and out-of range values coming from data validation (QC)
* *Location status*: Status, warnings and alerts related to the actual position of a device compared to its nominal position. E.g. a platform leaving its watch circle

Each of the above categories can have one of the following alert levels.
* *All clear* (green): No anomalies currently known for this category
* *Warning* (orange): A potential anomaly or a value close to alert range detected
* *Alert* (red): A anomaly detected for this category
* *Unknown* (black): Status currently not available or cannot be computed

Other status and state types exist and are explained below, including:

* Aggregate status for a device: Combination of the 4 status categories above to provide one combined status for a device, e.g. to appear in tables
* Roll up status for portals, stations, sites and facility
* Network status for the cabled network
* Agent and agent driver state (with respect to a defined state machine model)
* Resource lifecycle state


*See Also:*

* [syseng:CIAD DM SV Events and Notifications]
* [syseng:CIAD MI OV Instrument State Models]
* [syseng:CIAD MI OV Instrument and Platform Agents]

h2. Status Behavior


h3. Device Status

This is status inferred from various information sources about a device, including:

* Instrument Agent for a cabled instrument
* Platform Agent for a cabled node
* Dataset Agent for uncabled devices or devices producing data files
* "Parent" platform or dataset agent for an instrument device
* Ingestion process for an agent data stream
* Delayed QC data post-processor process
* Operator generated or cleared

In general, the status of a device is considered "All Clear" (green) if information about this device is available and recent (see below). E.g. the absence of any warnings or alerts implies "All Clear" status. If somehow there is no recent information about the device, the status is set to "Unknown".

h4. Power Status

Status, warnings and alerts related to device and device port power. E.g. power surges at the port an instrument is connected to

* For a cabled instrument, the immediate node to which the instrument is connected to provides the power status. Status includes any deviations from expected voltage draws and errors such as ground faults etc. The instrument does not have to be powered up for power status to be known.
* For an uncabled instrument, the power status can be inferred from status data files parsed by a parent platform level dataset agent.

h4. Comms Status

Status, warnings and alerts related to device connectivity and telemetry to remote devices. E.g. satellite communications distortions or delays and direct communications with the instrument via the port agent

* For a cabled instrument, it's a combination of information inferred from the device's agent and agent driver and any information coming from the instrument's parent's platform (node).
* For an uncabled instrument, the comms status may be inferred from status data files parsed by a OMC or platform level dataset agents, e.g. comms records. It may also be partially inferred from the arrival of new data files in expected time intervals.

h4. Data Status

Status, warnings and alerts related to abnormalities with data acquired from devices. E.g. "stuck values", spikes and out-of range values coming from data validation (QC)

* Inferred as a combination of the agent parsing measurements and running real-time QC algorithms, plus alerts coming from delayed QC post-processors (e.g. when a time period of samples is required as for spike tests). See [syseng:CIAD SA OV Data Validation].

h4. Location Status

Status, warnings and alerts related to the actual position of a device compared to its nominal position. E.g. a platform leaving its watch circle

* For a cabled instrument, this information is not relevant. Cabled instruments are stationary and location is not measured. Location is metadata entered manually into the system.
* For an uncabled instrument, the location status can be inferred from engineering data files parsed by a parent platform level dataset agent, e.g. current or most recent GPS readings.

h3. Device Status Availability

{note}This is under refinement for Release 3{note}

Device status can be considered available, e.g. to an operator checking a device facepage on the UI influenced by the following conditions:

* At some point in the past, status information was known about the device - mandatory
* An agent (resp. agent driver) has an active connection to the device (i.e. agent in state IDLE or higher)
* The agent is not currently in direct access mode (not applied in Release 2)
* The information known about the device is not older than an "acceptable" limit (not applied in Release 2)

If device status is not available it is represented as "Unknown".

h3. Aggregate Status

The aggregate status is computed as a combination of the device status values.
* If all of the device status values are "Unknown", the aggregate status value is "Unknown".
* If there is any "Alert" status then the aggregate status value is "Alert"
* If there is any "Warning" status and no "Alert", then the aggregate status value is "Warning"
* Otherwise the aggregate status shows "All Clear". This implies there is at least one "All Clear" status and "Unknown" otherwise.

h3. Roll-Up Status

Devices (in cases of assemblies) and Sites have a roll-up status. This is a status that includes the aggregate status for the specific device combined with the aggregated status for all the child devices or sites. For instance, the PlatformSite (station in user terms) related to the top-level platform assembly for a CG surface mooring has a status that is the rollup of this platform's components (e.g. mooring float, riser, benthic package) with all their assembled instruments. Status roll-up is computed even if status for certain components is unknown.

!https://docs.google.com/drawings/d/1kZ_L4xr4Be0OdqMDX6tiI50hROgvLHU4HcnD7e_NIKE/pub?w=1200!

_Figure 1. Status roll-up along device and observatory hierarchies (OV-1)_

h4. Device Roll-Up Status

This applies to platform level devices. For instruments, the roll-up status is equal to the device aggregate status.

The roll up status is computed as a combination of the aggregate status of the device and the aggregate status of all child devices.

* If all of the status values are "Unknown", the roll-up status value is "Unknown".
* If there is any "Alert" status then the roll-up status value is "Alert"
* If there is any "Warning" status and no "Alert", then the roll-up status value is "Warning"
* Otherwise the roll-up status shows "All Clear". This implies there is at least one "All Clear" aggregate status and "Unknown" otherwise.

h4. Portal, Station, Site and Facility Roll-Up Status

This applies to observatory level resources, including:

* Instrument Portal (InstrumentSite resource type)
* Platform Portal (PlatformSite resource type for non top-level platforms)
* Station (PlatformSite resource type for top-level platforms)
* Site (Observatory resource type)
* Facility (Org resource type

A portal or station can only have a status other than "Unknown" if there exists an active primary deployment to this resource. E.g., an instrument portal only shows a status if there is an active Deployment resource existing connecting the portal and the instrument device for a period of time.

The rules to compute roll-up status match the ones for device roll-up status.

h3. Agent and Agent Driver Status

The agent state is internal to the OOINet operations and only relevant to device operator users. The agent is the software component representing a device within the OOINet. Therefore it is substantially related to producing information for a device's status. It is, however, not the only source of information about a device (the parent agent, ingestion and QC post-processors are other sources as explained above). It is therefore not required to have a running agent to know some device status.

The agent driver is managed by the agent. The device itself is managed or represented by the driver. The agent state follows a defined state machine. Composite states reflect the states of the agent driver and the device itself. 

See [syseng:CIAD MI OV Instrument State Models] for more details

h3. Resource Lifecycle State

This state represents a process workflow state related to an OOINet resource and not an operational status about a device or an inferred operational status. The lifecycle state e.g. indicates whether an instrument is planned to exist in the future within the observatory, or exists right now but is not deployed, or whether it is deployed in its target deployment environment (e.g. at sea).

See [syseng:CIAD COI OV Resource Lifecycle] for more details.

h3. Network Status

For the cabled RSN network, every device is connected to this network and through a chain of network links eventually to a shore station. There is currently one distinct network path for all devices to the shore, but redundancy in network links is part of the RSN cable design and may exist in the future. Network links between devices are captured in form of "hasNetworkParent" associations between Device resources.

Network status is not presented to the user in Release 2 but can be computed for Release 3 the following way:

* If the status for a device and its connected network parent are "Unknown", the network status is set to "Unknown".
* If there is any "Alert" status for the device or its connected network parent, then the network status value is "Alert"
* If there is any "Warning" status and no "Alert" for the device or its connected network parent, then the network status value is "Warning"
* Otherwise the network status shows "All Clear". This implies there is at least one "All Clear" device and its connected network parent status and "Unknown" otherwise.

See [CIDev:OOI Asset Organization]

h3. Status Event Detection and Persistence

Device Agents, such as Instrument and Platform Agents monitor the acquired sample and engineering measurements (and the absence thereof) for warning and abnormal conditions and keep an current status level for each of the main categories defined above, as well as for individually defined "monitorables". In case the status level changes in either direction (e.g. between All Clear to Warning) an event is published. Events are are also produced when warnings and alerts are cleared.

The system maintains a persistent projection with the last know status values per device. The device state persister is a plug-in to the event persister, which persists all events to persistent store. The device state persister receives all state, status and alert events for all OOINet devices and projects any relevant information into a state profile per device. This state profile can be retrieved, e.g. when a UI facepage is requested. The combination of multiple device state profiles according to the defined observatory structure leads to the computation of aggregate and roll-up status.

h2. Status Event Production

The figure below shows various cases of device status event production:

!https://docs.google.com/drawings/d/1rzdpZXczW-e59wjIZzyuUHUEHccPOlirA6B3vioM2RQ/pub?w=1062&h=754!

_Figure 2. Status event production for the main deployment cases (OV-1)_

There are 4 main cases of status event production:

* Cabled devices: An agent continuously connected to the device and/or the RSN Observatory Management System (OMS) monitors real-time values and detects and clears alerts. Most agents have parents
* Uncabled devices: Information about these platforms and instrument measurements is related to ION via a data file interfaces, specified in form of IDD documents. Data agents monitor data file directories for new arrivals, parse the files and publish data and events. Event relay lag is dependent on the timeliness of file availability to OOINet. There exist dataset agents for each instrument and on the platform level.

There may be additional detail information associated with a status value or alert level, such as specific values for engineering data, such as voltage levels

h2. Agent Stream Alerts

Agents can be configured to monitor parameters using alert detectors, and emit ALL_CLEAR, WARNING, and ALERT events in case the alert status changes.

See [CIDev:Agent Stream Alerts] for implementation details

h2. Alert Display and Management

Users have 2 ways to be informed about device status events, such as alerts.
# Via the UI, by accessing a status page
# Via the notification (subscription) system with notifications delivered as SMS text messages or emails as configured

The [SA Observatory Management Service|syseng:CIAD SA OV Observatory Management Service] has operations to retrieve observatory status for an Org or Site with all its defined child sites and associated deployed devices. This status computation is performed on status retrieval based on last know status information persisted by the device state persister (see above).