Resource is supplanted by changed version
|References||UC.R1.24 Version A Resource
(see also Comments below)
|Uses||UC.R2.20 Annotate Resource in Registry|
|Is Used By|
|In Acceptance Scenarios||AS.R2.03A Modelers Integrate External Model with OOI, AS.R2.04A Data Product Leads Drive Core Data Product Creation|
|Is Extended By|
|Technical Notes||Use Case applies to data sets only, not other resource types.
A new version of a data set deprecates the previous data, and must be declared by the data provider.
A new data granule or packet (supplement) of a data stream does not create a newly versioned data set.
EUC only applies to step 3, the rest is PGM.
|Primary Service||OOI Common Data & Metadata Model Part 2|
|UC Status||Mapped + Ready|
|UX Exposure||PGM, (#3)EUC|
This information summarizes the Use Case functionality.
Updates to a data set arrive in the Integrated Observatory, for instance via a data stream. The provider may declare that these updates are a new data set version. The Integrated Observatory detects the new version via the declaration (in associated metadata), and makes the appropriate associations to document the versioning relationship for services that care about it.
- This use case is about versioning data sets, not resources in general. The version of a data set relates to whether the digital content of that data set has been deprecated, not whether the metadata that makes up the resource description has changed.
- The key concept in determining whether one data set is a different version of another is whether the original is effectively replaced (= deprecated).
- Things that are not versions: when fundamentally different data sets are created (this is a transformation); when data are added to a data set (this is a supplement); when additional data values are output, so the syntactical description of the data has to change (this is a different data set, hence a transformation).
- In the case where a comparable data set is created that does not deprecate the original, as for example using an alternative algorithm to produce the same data package (but with different values), the relationship between the two data products is not versioning. (We call this a data set variant, and it is treated as a related but separate data set.)
- The provider of the data set can uniquely specify the data set being replaced (likely by an Integrated Observatory unique identifier, but other methods may be supported).
- The provider of the data knows what they are talking about when it comes to versions. (See comments section below.)
- The policy for deletion of versions is the same as the policy for deletion of the original resource of that type. In particular, for resources that are not deleted, their deprecated versions are not deleted either.
- In Release 2, each version may be replaced by only one version.
A Data Provider has registered a Data Source and provided data input for ingestion into the Integrated Observatory, which has associated all input from this Data Source with a Data Set. The provider has new data to input which should replace the previously provided data.
- The Data Provider submits data input for a pre-existing Data Source that is marked as a new version, or replacement, of an existing resource.
- Note: the unit of input (data packet or granule) does not have to be the same as the unit in which the data was originally provided
- The unit of data input may be an entire "data set" as understood by the data provider
- In Release 2, this operation may be limited to replacing the entire data set, not just a part of it.
- The Integrated Observatory ingests the data packet and creates a new version of the uniquely identified Data Set associated with the Data Source.
- The version metadata for the Data Set enables access to the current and previous versions of the Data Set.
- The current version "is successor version of" the previous data.
- All ingested Data Sets, including previous versions, remain immutable. They can be accessed as needed, upon request to the user interface.
- <3> Data Consumer can access and navigate Data Set versions
- Including finding the first, previous, next, and last (current) version of a Data Set.
- "Current version" must refer to the last version of the chain, not necessarily the most recently received.
- This is a way of traversing a specific type of association between immutable information resources
- Appropriate user operations are allowed on any particular version.
- The only version presented by default to users is the one that is not deprecated; the user must explicitly request other versions in order to see them.
A new version of the Data Set exists and its relation to the previous version is established and accessible.
These comments provide additional context (usually quite technical) for editors of the use case.
The Cyberinfrastructure project's application of the term 'version' to data sets is different than the use in the Data Management Plan (see Version 1-25 of that plan, sections 3-13 and 3-21). A version in this use case (and in the Product Specification) refers explicitly to cases where one data set deprecates another. (The cases in the Data Management Plan refer to different data products, in the language of the Cyberinfrastructure documents.)
The original steps were extremely generic, to support versioning as an abstract concept that could be defined by anyone across any resource type. The alternative design presented here, less flexible but more straightforward, requires that new resource versions are identified by the submitter, so that the system knows to apply the appropriate annotations. With this change, most previous Comments are now moot.
In principle, there can be multiple versioning axes, so a single resource could be replaced with 2 or 3 resources of different provenance and purpose. There may be types of versions eventually; these may reflect the reason for the replacement (replaces_bug_fix, replaces_calibration_error) or its relevance (replaces_model_outputs_unchanged). This is a subtlety to be explored in later releases.
We will become alert in time to typical cases where the provider does not know about versions and uses them incorrectly. In particular, a provider may provide supplements by replacing the entire data set and calling it a new version, when really it is simply a supplement. Of course, if the data format is opaque, we have no way to detect this.
In later releases, it may prove appropriate for the system to automatically detect new versions of data.
[A View on States etc.|Conversations and Commitments^ConversationalProcessing_CI_2011-03-06_ver_1-00.pptx] (a PPT, see slides 8,9) provides a concept of versions as a succession of states (immutable values), which together represent an identity. Slide 9 is consistent with the view of versions expressed above.
Versioning, Provenance, and Related Concepts has a detailed discussion of some of the complexities that should be addressed, or explicitly avoided, in the general case. As this was written before the use case was simplified, many of those concepts are no longer central.