Skip to end of metadata
Go to start of metadata

Overview of "DataProduct" Resource

Represents a uniquely identified presentation of a defined collection of information. The presentation consists of a particular subset of the collection. A DataProduct may have a real-time stream and may have persisted historic data.

Metadata

Refer to the CIAD APP Resource Model page for metadata definitions.

Responsible Service Data Product Management Service
Architecture References CIAD SA OV Data Product Generation
CIAD OV Resource and Object Model (needs update)
CIAD OV Data Flows (needs update)
Other References Object Spec for DatumCharacterization
Object Spec for DataRecordFormat
Technical Notes  
Version 1.1
Comments The attributes necessary to produce the many formats of Data Product. Persistence is NOT implied.
User Facing Yes
Open Issues/Changes Exact relationship of Data Product to Data Set. Many issues have been addressed on Discussion page.

Summary

To be a data product in OOI, the entity must (a) have a unique identifier by which it is referenced, (b) be a subset of a well-defined collection of information. (It may be either an improper subset — that is, the entire well-defined collection – or a proper subset.)

The data product contains sufficient metadata or references to metadata to unambiguously describe its content and provenance. An extensible set of formats will be available to the user to request the information in the particular byte order and organization that suits them; the format specification does not change the identity of the data product.

An Invariant Data Product is a temporally invariant data product. Requesting the data corresponding to the Invariant Data Product URL will produce the same exact data for the life of the system. (Any data or annotations that become known to the system after the Invariant Data Product URL's time mark will not be included in a representation of the data.)

A Core Data Product is a data product that comes from an OOI owned instrument, including L0 and the derivative L1 and L2 data products that OOI has defined and described and which will be produced and managed with OOI expertise. (The OOI team often refers to this simply as a 'Data Product', and the listener must infer the appropriate meaning.)

Note that the attributes that are stored for the Data Product must be converted into different formats according to different standards. For example, ISO 19115 metadata call for particular attribute names and forms; netCDF CF conventions call for specific attributes within the netCDF form; and ACDD conventions call for additional specific attributes to be provided. The attributes specified here were derived as a common set of information that could be extracted into these different forms, even as they are also useful at a more detailed level internally. (Generating core ISO 19115 metadata for data products is required in Release 2, with more comprehensive ISO 19115 compatibility required for Release 3. netCDF CF is required in Release 2 and beyond; netCDF ACDD conventions are desirable.)

Attributes and Metadata References

References in this section point to the material that describes attributes of the resource, if any.

Generated Specification

html-include: java.net.UnknownHostException: architecture.oceanobservatories.org

Labels

r2resource r2resource Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Dec 20, 2012

    Michael Meisinger says:

    Attribute List Attributes From Resource Type Resource function ExpandMacroTog...

    Attribute List

    Attributes From Resource Type Resource

    Click here to expand...

    Additional comments on these attributes:

    • Name and creation date are required by several Metadata Standards. Name => ISO 19115 CI_Citation.title, Creation date => ISO 19115 CI_Citation.date.
    • For Core Data Products, the Name should come from the official Core Data Products spreadsheet, linked from the [science:Data Product Specifications] page. For L1 and L2 core data products, this name will be referenced by the ATBD.
    • The Description attribute should be used to satisfy the "abstract" requirement of ISO 19115 (=> MD_DataIdentification.abstract) and the "summary" field of ACDD. It is in the ATBD for L1 and L2 core products.

    Attributes From Resource Type Information Resource

    Click here to expand...

    Attributes For This Resource Type

    Attribute Name (YML) Attribute Name (user) Definition Type/Units Required? Comments Status
      Data URL a link to the (time-varying) data product URL Mandatory Only required here if _id of Resource is not a URL; note time-invariant version below  
      DOI A Digital Object Identifier string Optional This requires some additional implementation thought - a DOI may be associated with an invariant version of a dataset, or a variant, updating one. DOIs could be associated with sets of products  
      Keywords lists key words and phrases that are relevant to the dataset list (of URLs) Optional (0..*)    
      ISO topic category one or more of the topic categories from ISO 19115 String Mandatory typically oceans. Note this must follow controlled vocabulary from ISO 19115 (attribute 'Dataset topic category')  
      IOOS category Selection from IOOS controlled vocabulary for parameters String (CV: IOOS parameter list) Optional Entries may be composed in Release 3. Here it needs a reference for a controlled vocabulary.  
      Data Record Description A description of the format and characteristics of the data contents [Data Record Description] Required LCA  
      Primary Comment miscellaneous information about the dataset String Optional originated in and serves the CF convention (ACDD)  
      Data Product Level OOI data processing level; L0, L1, or L2. The definitions of these levels are currently in revision, and should be verified from the [OOI Data Products document] when it is finalized. String (CV: Data Levels) Mandatory The OOI levels are analogous to but not the same as NASA levels. Sub-levels may yet be specified (this is likely undesirable).  
      Data Quality Control Level Terms are currently in review. See [OOI Data Products document] for latest (not necessarily normative) definitions and controlled vocabulary String Optional    
      Version Status Indicates whether this is a deprecated dataset String (CV:Deprecated, Current) Mandatory Will this duplicate lifecycle state? JBG guess it may  
      Curation Category Used to group data into categories that will have different data curation requirements, processes, or policies String (CV) Mandatory CV: "core", "OOI non-core science", "OOI non-science", "PI" "external" (others may be possible)  
      CDM Data Type THREDDS data type appropriate for this dataset. E.g., "Grid", "Image", "Station", "Trajectory", "Radial" (ACDD) String (CV) Mandatory From ACDD standard, CDM.  
      ISO Spatial Representation Type ISO Spatial Representation Type String (CV: Vocabulary: Grid, vector, textTable, tin, stereoModel, video) Optional Optional for ISO 19115 (=> MD_DataIdentification.spatialRepresentationType  
      License** Describes the permissions and restrictions for access to and distribution of OOI data; presumed to be legally enforceable language. String Mandatory Required by several metadata standards. The license will be determined by policy for all OOI data.  
      License URI Reference identifier to an explicit license, as described in License attribute. URL Optional Ideally, should be resolvable. Makes the license specification computable. May be specified after R2 release.  
      Data Contact** Provides the name and contact information for one or more persons associated with the dataset. Repeatable. Contact (Object) Mandatory (1..*) Needed for ISO ( => CI_ResponsibleParty) and ACDD standard metadata. The Publisher would presumably be OOI, or similar, in all cases. For Core Data Products, the creator and contributor should perhaps also be OOI. (Where 'OOI' means a virtual entity, similar to description for Metadata contact.)  
      Metadata contact** the name and contact info for the metadata creator Contact (Object) Mandatory Required for ISO 19115 core metadata. The metadata point of contact should reference an invariant contact like "Data Manager" with an email of something like "datamanager@oceanobservatories.org" that can be forwarded to the appropriate person over time.  
      Provider Project** The name of the project from which the data came. String (CV:tbd) Optional From "project" in ACDD standard. For core data products, it will be Ocean Observatories Initiative. For external data, it will be obtained as part of the metadata provided for those data.  
      Provider Acknowledgement** A place to acknowledge various types of support provided by the project that produced the data string Optional From ACDD standard. This is text provided for scientific attributions, but is not enforceable.  
      exclusive_rights_status Indicates whether the dataset is under proprietary hold or whether it is public string: unrestricted, temporary_hold, permanent_hold Mandatory Temporary holds of up to 1 year for PI-owned instruments are currently allowed under the draft data policy. Permanent holds are not expected, but are theoretically possible based on national security concerns  
      exclusive_rights_end_date For temporary data holds, the date on which the hold will expire datetime Mandatory    
      exclusive_rights_contact the name and contact information of the person/entity who requested the exclusive rights contact (Object) Mandatory    
      exclusive rights notes Further information about the exclusive rights hold, such as a justification for a permanent hold string Optional    

    Computed Attributes List

    Computed Attribute Name Description Type/Units Required
    Comment
    Status
    Invariant Data URL a link to a (time-invariant) data product as of the moment the data is displayed; the data accessed with this URL will not change over the life of the system URL Mandatory this is envisioned as the regular Data URL with a timeInstant string appended, to freeze the time reference for the data; note this is not when measurements were made, but when the ION knows about them. Any number of invariant IDs are possible; a unique one gets composed when a data product is advertised (made available), or upon user request. LCA (placeholder), R2 (functional)
    Data Mode The mode of data acquisition. The definitions and names of these modes are currently in revision, and should be checked in the [OOI Data Products document] when it is finalized String (CV:tbd) unknown this may not be required, in the final analysis, or may just take the form of a tag. R2
    Metadata creation date** Time when the metadata was generated Timestamp Mandatory For simplicity in R2, this can be the timestamp of the current presentation of the attributes (if it is too complex to track the change date of each attribute and calculate the last change time). R2
    Provenance description of how the data came into being Provenance Description (Object) Mandatory includes who, when R2
    Geospatial bounds** the geospatial coverage of the data product Geospatial Bounds (Object) Mandatory Required for most metadata standards, including NetCDF ACDD (note the specific Recommended fields in the ACDD). LCA (placeholder), R2 (functional)
    Bottom depth bounds** The minimum and maximum bottom depth, when known, of the region covered by the data product. Geospatial Bounds (Object) Optional In R2, only presented for those data sets for which it is already known R2
    Time bounds** temporal coverage of the data product timeInterval Object Mandatory Required by many metadata standards; does not include other information about time like uncertainty (note the specific Recommended fields in the ACDD. LCA
    Data bounds the max and min value for each parameter/variable in the dataset as appropriate for the parameter(s) Optional (0..*) Useful for quick data assessment R2
    External Archive References information about archival of this Data Product to an external system External Archival object Optional Useful to have if known R2
    Number_active_subscriptions The number of current subscriptions to the data Integer Mandatory Needed for data curation R2
    Active_subscriber_information** Information about the active subscriptions, including usernames and date of subscription Active_subscriber_info (Object) Mandatory Needed for data curation R2
    Number_past_subscribers The number of past, terminated subscriptions Integer Mandatory Needed for data curation. R2
    Past_subscriber_info** The username and start and end dates of closed subscriptions past_subscriber_info (Object) Mandatory Needed for data curation R2
    Number_downloads The number of downloads (e.g. access not via subscription but via a file download option Integer Mandatory Needed for data curation R2
    Download_information** Information on the dates, invariant URLs and usernames for downloads download_information (Object) Mandatory Needed for data curation R2
    Number_invariant_URLs The number of invariant URLs that have been provided Integer Mandatory Needed for data curation R2
    Invariant_URL_information** information on the dates and associated users for invariant URLs Invariant_URL_information (Object) Mandatory Needed for data curation R2

    **These attributes may be composed or not, either because external data will differ from OOI data (many of these can be composed for OOI data, but presumably not for external data) or because they represent a composite attribute that has components that can and can't be composed.

    Associations List

    Subject Predicate Object Comments
    Data Product hasDataset Data Set  
    Data Product hasProducer Data Producer  
    Instrument hasProduct Data Product  
    Data Process hasProduct Data Product  
    Data Product hasEventLog Attachment  
    Data Product deprecatesResource Data Product creates a new version
    Data Product hasCQFlags Information Resource  
    Data Product hasDerivedQCProduct Information Resource  
    Data Product hasAnnotation Attachment  

    Referenced Object Types

    Name of Object Definition Type/Units Comments
    External Archival Information about the data's archival to an external repository. Consists of:
    • Archival name: Indicate where the data product has been archived, if it has been submitted to an external repository such as the National Ocean Data Center (CV string)
    • Archive Date : The last date and time this data product was submitted to an external repository (timeInstant)
    • External Archive URL: The URL of the data product within an external repository (string — URL)
    • External Archive ID: the Unique ID of the resource in an external repository (string)
    • the contact information for the archive POC
      assume maximum of 1 external archive per Data Product in Release 2
    Provenance Description describes how the Data Product was calculated, ideally in computable form   recommend using SSDS format as baseline
    active_subscription_information provides the user contact info and date initiated for all active subscriptions Contact (Object) and datetime  
    past_subscription_information provides the user contact information, date initiated, and date terminated, for all past subscriptions Contact (Object) datetimes, strings (for email)  
    download_information The dates and user contact information for data downloads datetimes, contact (Object)  
    invariant_URL_information The list of Invariant URLs that have been provided, and for each the dates and user contact info strings (URLs), datetimes, Contact (Object)  

    Notes and Questions List

    See R2 Resource Page for Data Product-Discussion.

    data URL: ISO 19115 recommends giving the URL for accessing the data product (CI_OnlineResource). Are the variant/invariant IDs going to be URLs? or composable into URLs? JGraybeal JBG: The ID of the resource must be a URL, or we will have to add a URL to serve this purpose. (The Resource resource type has been edited to indicate the value of having the ID be a URL.) This resource ID (or similar) will be a variant ID by default. The invariant ID can be composed from the variant URL and the time at which the data is to be 'frozen'.
    DOI: how would DOIs captured when they can be associated with a variant or invariant version of a dataset. There may be multiple DOIs per data product. Could there be DOIs associated with multiple products? Is a registry approach needed? JGraybeal