Skip to end of metadata
Go to start of metadata

Overview of "Version Data Set" Use Case

Resource is supplanted by changed version


Tip: Key Points
UC Priority= 4 or 5: Critical, is in R2
Only boldface steps are required
<#> before a step —> lower priority
(optional) —> run-time option

Related Jira Issues:   Open   •   All

Metadata

Refer to the Product Description and Product Description Release 2 pages for metadata definitions.

Actors Data Provider
References UC.R1.24 Version A Resource
(see also Comments below)
Uses UC.R2.20 Annotate Resource in Registry
Is Used By  
In Acceptance Scenarios AS.R2.03A Modelers Integrate External Model with OOI, AS.R2.04A Data Product Leads Drive Core Data Product Creation
Extends  
Is Extended By  
Technical Notes Use Case applies to data sets only, not other resource types.
A new version of a data set deprecates the previous data, and must be declared by the data provider.
A new data granule or packet (supplement) of a data stream does not create a newly versioned data set.
EUC only applies to step 3, the rest is PGM.
Lead Team DM
Primary Service OOI Common Data & Metadata Model Part 2
Version 2.6
UC Priority 4
UC Status Mapped + Ready
UX Exposure PGM, (#3)EUC

Summary

This information summarizes the Use Case functionality.

Updates to a data set arrive in the Integrated Observatory, for instance via a data stream. The provider may declare that these updates are a new data set version. The Integrated Observatory detects the new version via the declaration (in associated metadata), and makes the appropriate associations to document the versioning relationship for services that care about it.

Assumptions

  • This use case is about versioning data sets, not resources in general. The version of a data set relates to whether the digital content of that data set has been deprecated, not whether the metadata that makes up the resource description has changed.
  • The key concept in determining whether one data set is a different version of another is whether the original is effectively replaced (= deprecated).
  • Things that are not versions: when fundamentally different data sets are created (this is a transformation); when data are added to a data set (this is a supplement); when additional data values are output, so the syntactical description of the data has to change (this is a different data set, hence a transformation).
  • In the case where a comparable data set is created that does not deprecate the original, as for example using an alternative algorithm to produce the same data package (but with different values), the relationship between the two data products is not versioning. (We call this a data set variant, and it is treated as a related but separate data set.)
  • The provider of the data set can uniquely specify the data set being replaced (likely by an Integrated Observatory unique identifier, but other methods may be supported).
  • The provider of the data knows what they are talking about when it comes to versions. (See comments section below.)
  • The policy for deletion of versions is the same as the policy for deletion of the original resource of that type. In particular, for resources that are not deleted, their deprecated versions are not deleted either.
  • In Release 2, each version may be replaced by only one version.

Initial State

A Data Provider has registered a Data Source and provided data input for ingestion into the Integrated Observatory, which has associated all input from this Data Source with a Data Set. The provider has new data to input which should replace the previously provided data.

Scenario for "Version Data Set" Use Case

  1. The Data Provider submits data input for a pre-existing Data Source that is marked as a new version, or replacement, of an existing resource.
    1. Note: the unit of input (data packet or granule) does not have to be the same as the unit in which the data was originally provided
    2. The unit of data input may be an entire "data set" as understood by the data provider
    3. In Release 2, this operation may be limited to replacing the entire data set, not just a part of it.
  2. The Integrated Observatory ingests the data packet and creates a new version of the uniquely identified Data Set associated with the Data Source.
    1. The version metadata for the Data Set enables access to the current and previous versions of the Data Set.
    2. The current version "is successor version of" the previous data.
    3. All ingested Data Sets, including previous versions, remain immutable. They can be accessed as needed, upon request to the user interface.
  3. <3> Data Consumer can access and navigate Data Set versions
    1. Including finding the first, previous, next, and last (current) version of a Data Set.
    2. "Current version" must refer to the last version of the chain, not necessarily the most recently received.
    3. This is a way of traversing a specific type of association between immutable information resources
    4. Appropriate user operations are allowed on any particular version.
    5. The only version presented by default to users is the one that is not deprecated; the user must explicitly request other versions in order to see them.

Final State

A new version of the Data Set exists and its relation to the previous version is established and accessible.

Comments

These comments provide additional context (usually quite technical) for editors of the use case.

The Cyberinfrastructure project's application of the term 'version' to data sets is different than the use in the Data Management Plan (see Version 1-25 of that plan, sections 3-13 and 3-21). A version in this use case (and in the Product Specification) refers explicitly to cases where one data set deprecates another. (The cases in the Data Management Plan refer to different data products, in the language of the Cyberinfrastructure documents.)

The original steps were extremely generic, to support versioning as an abstract concept that could be defined by anyone across any resource type. The alternative design presented here, less flexible but more straightforward, requires that new resource versions are identified by the submitter, so that the system knows to apply the appropriate annotations. With this change, most previous Comments are now moot.

In principle, there can be multiple versioning axes, so a single resource could be replaced with 2 or 3 resources of different provenance and purpose. There may be types of versions eventually; these may reflect the reason for the replacement (replaces_bug_fix, replaces_calibration_error) or its relevance (replaces_model_outputs_unchanged). This is a subtlety to be explored in later releases.

We will become alert in time to typical cases where the provider does not know about versions and uses them incorrectly. In particular, a provider may provide supplements by replacing the entire data set and calling it a new version, when really it is simply a supplement. Of course, if the data format is opaque, we have no way to detect this.

In later releases, it may prove appropriate for the system to automatically detect new versions of data.

[A View on States etc.|Conversations and Commitments^ConversationalProcessing_CI_2011-03-06_ver_1-00.pptx] (a PPT, see slides 8,9) provides a concept of versions as a succession of states (immutable values), which together represent an identity. Slide 9 is consistent with the view of versions expressed above.

Versioning, Provenance, and Related Concepts has a detailed discussion of some of the complexities that should be addressed, or explicitly avoided, in the general case. As this was written before the use case was simplified, many of those concepts are no longer central.

(click on # to go to R2 use case)
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
61     27B

Labels

r2-usecase r2-usecase Delete
usecase usecase Delete
productdescription productdescription Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Oct 16, 2011

    Alan Chave says:

    Assumption 1 makes no sense. There are statements about what constitutes a versi...

    Assumption 1 makes no sense. There are statements about what constitutes a version in the DMP, but the one stated is not in the list.

    Primary service should be Data Ingestion Service