|Release 2 applies Data Validation in form of automated QC for core OOI data products and a very limited form of HITL QC.
The Data Validation capabilities are enhanced and generalized in Release 3 towards user registered validation procedures and interactive QC within the system.
Data Validation capabilities are applied to validate data in data products, for instance to apply automated and interactive data quality control (QC).
The Data Validation services are based a number of enabling infrastructure services, including:
- Data product generation (SA R2)
- Data processing (SA R2)
- Common data and metadata model (DM R1+2)
- Science coverage model (DM R2)
- Data distribution (DM)
- Process management (CEI)
The availability of a QC depends on the SAF publications. If a data product under the SAF publications indicates that a QC flag is available then when the dataset is downloaded via ERDDAP the QC flags will be evaluated at the time of the download request and included with the download.
ION generates automated QC events when a Global Range QC or a Local Range QC flag is indicated at the time when the data is ingested from an instrument agent. The event, ParameterQCEvent, is published with the data product identifier as the origin, it lists the name of the input that caused the flag to be generated as well as the temporal domain values and a description.
Automated QC in Release 2 is based on parameter functions executing within the Science coverage model. QC parameters hold the result of parameter functions that compute based on available independent and other derived parameters within a coverage. QC parameters are never persisted in Release 2, but can be retrieved and downloaded on request.
|See [here for OOI defined DPS documents] covering QC procedures|
The following table lists DPS required QC functions and their implementation status in Release 2. See also [science:Lookup Tables for Basic QC Algorithms]:
|QC DPS (with link)||Procedure Name||Description||Needed first||Impl Status||Comments||QC lookup parsing|| Shape IN vs shape OUT
||Needs Input Range|| Non-isomorphic
|GLBLRNG||Global Range Check QC||An array of 8-bit integers, where 0 indicates that the input is not within the range and 1 indicates that the input value is within the range. -99 indicates that the value is either missing or failed to be computed.|| Summer 2013
||Implemented in real-time||May need to change for consistency reasons and be able to calculated CMBNFLG||OK||SAME||NO||NO|
|LOCLRNG||Local Range Test QC||An array of 8-bit integers, where 0 indicates that the input value is within a local range, which is a function of time and position. 1 indicates that the value is within range. -99 indicates that the value is either missing or failed to be computed.|| Summer 2013
|| Implemented in real-time
||The python N-D interpolation algorithm yields different outputs than described in the DPS using the MATLAB code.||OK||SAME||NO||NO|
|SPKETST||Spike Test QC||An array of 8-bit integers, where 0 indicates that the input at this position deviates from a normalized average of the neighbors by a significant amount. 1 indicates the value falls within the normalized average of the neighboring values. -99 indicates that the value is either missing or failed to be computed.||Summer 2013||Not supported in real-time||Output is a similar shape than input - for every array in an array out. Value for time step x requires a range of historic and future values||OK||SAME (returns similar shape array for input array)||YES, historic and future||YES|
|TRNDTST||Trend Test QC||An array of 8-bit integers, where 0 indicates that the difference between this position and the neighboring values falls outside the range of allowed variability. 1 indicates that the difference is normal. -99 indicates that the value is either missing or failed to be computed.|| Summer 2013
|| Not supported in real-time
||The implementation deviates from the data product specification by returning an array of values in lieu of a scalar.||DIFFERENT (returns 1 value for an array of inputs)||?|
|STUCKVL||Stuck Value Test QC||An array of 8-bit integers, where 0 indicates that the value has been repeated without any change too many times. 1 indicates that the value is not "stuck". -99 indicates that the value is either missing or failed to be computed.|| Summer 2013
|| Not supported in real-time
||OK|| SAME (returns similar shape array for input array)
|| YES, historic (require prior n values)
|GRADTST||Gradient Test QC||An array of 8-bit integers, where 0 indicates that the change in value does not fall within a predetermined range. 1 indicates that the change in value is normal. -99 indicates that the value is either missing or failed to be computed.|| Summer 2013
|| Not supported in real-time
|SOLAREL||Solar Elevation QC||Not a QC data product. This is a transform.||Not Supported||Not a QC data product||N/A||N/A||N/A||N/A|
|MODULUS||Modulus QC|| Not a QC data product. This is a transform.
||Not Supported||Not a QC data product||?|
|INTERP1||1-D Interpolation QC||Not a QC data product. This is a transform.||Not Supported||Not a QC data product||?|
|POLYVAL||Evaluate Polynomial QC||Not a QC data product. This is a transform.||Not Supported||Not a QC data product||DIFFERENT||LIKELY|
|CMBNFLG||Combined QC Flags||An array of 8-bit integers, where 0 indicates that a subset of QC input arrays contain a 0. 1 indicates that no subset of QC input arrays contain 0. -99 indicates that the value is either missing or failed to be computed.||TBD|| Not supported in real-time
||This is a combination of other QC outputs.||?||?||?||?|
See [science:Lookup Tables for Basic QC Algorithms].
Luke manually downloads GoogleDoc as CSV and attaches them to DataProduct resource.
Figure. Isomorphic vs Non-isomorphic Functions
Coverage parameter functions may need more input values than what is requested by a retrieve request. The coverage currently does not retrieve these additional values, leading to incorrect calculations for some values. The caller has to account for these windows and make appropriate calles to retrieve.
Interactive QC in Release 2 is based on the following manual (Human In The Loop - HITL) procedure, strictly available only to OOI data operators:
- Download a chunk of data for a data product via ION, e.g. a days's worth
- Apply a defined external tool (e.g. Matlab) to fill out existing interactive QC parameter values. Save as a file.
- Note: it is not possible to modify existing automated QC parameter values or existing science measurement values
- Must use a defined file format and only expected contents
- Upload interactive QC file using an ION operator mechanism (e.g. UI page or operator script or file drop directory location)
- An external dataset agent scans a directory for new interactive QC files, associates them to existing data products and publishes the interactive QC parameters as a granule
- Independent parameters are ingested from granule
The prerequisite is that all interactive QC parameters are defined by ION resource import or by an operator, before the HITL procedure occurs.
Interactive QC is a process combining the automatic generation of derived, qualified data products (and updates for these data products) with interactive (human in the loop) annotation, association, approval processes. In order to provide such a strongly user interface based capability, data visualization and interactive workflow support must exist.
For interactive QC, a number of capabilities need to be present, including:
- Automated QC data product generation (see above)
- Interactive workflow support (AS R3)
- Visualization (AS R2)
- Data association and annotation services (DM R3)