Skip to end of metadata
Go to start of metadata
Release 2 applies Data Validation in form of automated QC for core OOI data products and a very limited form of HITL QC.

The Data Validation capabilities are enhanced and generalized in Release 3 towards user registered validation procedures and interactive QC within the system.

Overview

Data Validation capabilities are applied to validate data in data products, for instance to apply automated and interactive data quality control (QC).

The Data Validation services are based a number of enabling infrastructure services, including:

Data Validation Procedures

Availability

The availability of a QC depends on the SAF publications. If a data product under the SAF publications indicates that a QC flag is available then when the dataset is downloaded via ERDDAP the QC flags will be evaluated at the time of the download request and included with the download. 

Realtime QC Events

ION generates automated QC events when a Global Range QC or a Local Range QC flag is indicated at the time when the data is ingested from an instrument agent. The event, ParameterQCEvent, is published with the data product identifier as the origin, it lists the name of the input that caused the flag to be generated as well as the temporal domain values and a description.

Automated QC

Automated QC in Release 2 is based on parameter functions executing within the Science coverage model. QC parameters hold the result of parameter functions that compute based on available independent and other derived parameters within a coverage. QC parameters are never persisted in Release 2, but can be retrieved and downloaded on request.

See [here for OOI defined DPS documents] covering QC procedures

Implementation Status:

The following table lists DPS required QC functions and their implementation status in Release 2. See also [science:Lookup Tables for Basic QC Algorithms]:

QC DPS (with link) Procedure Name Description Needed first Impl Status Comments QC lookup parsing Shape IN vs shape OUT 
Needs Input Range Non-isomorphic 
GLBLRNG Global Range Check QC An array of 8-bit integers, where 0 indicates that the input is not within the range and 1 indicates that the input value is within the range. -99 indicates that the value is either missing or failed to be computed. Summer 2013
Implemented in real-time May need to change for consistency reasons and be able to calculated CMBNFLG OK SAME NO NO
LOCLRNG Local Range Test QC An array of 8-bit integers, where 0 indicates that the input value is within a local range, which is a function of time and position. 1 indicates that the value is within range. -99 indicates that the value is either missing or failed to be computed. Summer 2013
Implemented in real-time
The python N-D interpolation algorithm yields different outputs than described in the DPS using the MATLAB code. OK SAME NO NO
SPKETST Spike Test QC An array of 8-bit integers, where 0 indicates that the input at this position deviates from a normalized average of the neighbors by a significant amount. 1 indicates the value falls within the normalized average of the neighboring values. -99 indicates that the value is either missing or failed to be computed. Summer 2013 Not supported in real-time Output is a similar shape than input - for every array in an array out. Value for time step x requires a range of historic and future values OK SAME (returns similar shape array for input array) YES, historic and future YES
TRNDTST Trend Test QC An array of 8-bit integers, where 0 indicates that the difference between this position and the neighboring values falls outside the range of allowed variability. 1 indicates that the difference is normal. -99 indicates that the value is either missing or failed to be computed. Summer 2013
Not supported in real-time
The implementation deviates from the data product specification by returning an array of values in lieu of a scalar.   DIFFERENT (returns 1 value for an array of inputs)   ?
STUCKVL Stuck Value Test QC An array of 8-bit integers, where 0 indicates that the value has been repeated without any change too many times. 1 indicates that the value is not "stuck". -99 indicates that the value is either missing or failed to be computed. Summer 2013
Not supported in real-time
  OK SAME (returns similar shape array for input array)
YES, historic (require prior n values)
YES
GRADTST Gradient Test QC An array of 8-bit integers, where 0 indicates that the change in value does not fall within a predetermined range. 1 indicates that the change in value is normal. -99 indicates that the value is either missing or failed to be computed. Summer 2013
Not supported in real-time
  OK     YES
SOLAREL Solar Elevation QC Not a QC data product. This is a transform.   Not Supported Not a QC data product N/A N/A N/A N/A
MODULUS Modulus QC Not a QC data product. This is a transform.
  Not Supported Not a QC data product       ?
INTERP1 1-D Interpolation QC Not a QC data product. This is a transform.   Not Supported Not a QC data product       ?
POLYVAL Evaluate Polynomial QC Not a QC data product. This is a transform.   Not Supported Not a QC data product   DIFFERENT   LIKELY
CMBNFLG Combined QC Flags An array of 8-bit integers, where 0 indicates that a subset of QC input arrays contain a 0. 1 indicates that no subset of QC input arrays contain 0. -99 indicates that the value is either missing or failed to be computed. TBD Not supported in real-time
This is a combination of other QC outputs. ? ? ? ?

Problems with automated QC

Lookup tables

See [science:Lookup Tables for Basic QC Algorithms].

Luke manually downloads GoogleDoc as CSV and attaches them to DataProduct resource.

Isomorphic vs Non-isomorphic Functions

Figure. Isomorphic vs Non-isomorphic Functions

Coverage parameter functions may need more input values than what is requested by a retrieve request. The coverage currently does not retrieve these additional values, leading to incorrect calculations for some values. The caller has to account for these windows and make appropriate calles to retrieve.

Interactive QC

Interactive QC in Release 2 is based on the following manual (Human In The Loop - HITL) procedure, strictly available only to OOI data operators:

  1. Download a chunk of data for a data product via ION, e.g. a days's worth
  2. Apply a defined external tool (e.g. Matlab) to fill out existing interactive QC parameter values. Save as a file.
    1. Note: it is not possible to modify existing automated QC parameter values or existing science measurement values
    2. Must use a defined file format and only expected contents
  3. Upload interactive QC file using an ION operator mechanism (e.g. UI page or operator script or file drop directory location)
  4. An external dataset agent scans a directory for new interactive QC files, associates them to existing data products and publishes the interactive QC parameters as a granule
  5. Independent parameters are ingested from granule

The prerequisite is that all interactive QC parameters are defined by ION resource import or by an operator, before the HITL procedure occurs.

Release 3 and beyond

Interactive QC is a process combining the automatic generation of derived, qualified data products (and updates for these data products) with interactive (human in the loop) annotation, association, approval processes. In order to provide such a strongly user interface based capability, data visualization and interactive workflow support must exist.

For interactive QC, a number of capabilities need to be present, including:

  • Automated QC data product generation (see above)
  • Interactive workflow support (AS R3)
  • Visualization (AS R2)
  • Data association and annotation services (DM R3)
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Feb 21, 2011

    Michael Meisinger says:

    There is additional material on the QA/QC pipeline at [Data flow during QC -...

    There is additional material on the QA/QC pipeline at [Data flow during QC - Lankhorst]