Skip to end of metadata
Go to start of metadata

Technologies

As of OOI-CI Release 2.0 the technologies elected to present a DAP interface for OOI datasets are ERDDAP and PyDAP. PyDAP provides a direct DAP interface to data and metadata available on coverage instances. ERDDAP hosts a catalogue of datasets and provides clients with a user-interface, catalogue, search and data manipulation functions. ERDDAP also provides clients with multiple data interfaces such as NetCDF, Comma separated value (CSV), MATLAB (MAT), OPeNDAP and ASCII formats.

ERDDAP

ERDDAP runs on a dedicated host separate of the OOI-CI container systems. Each ERDDAP instance has a configured Apache tomcat server that hosts the ERDDAP servlet. Every instance has special configurations to modify the interface, configure data endpoints and dataset configuration parameters such as update frequency. The tomcat server(s) are configured to have only a single servlet running, which is ERDDAP.

PyDAP

PyDAP can run independent of a container if configured to do so, but as of Release 2.0 it runs on multiple capability containers as a web service gateway interface (WSGI). The capability container hosts a gevent-based WSGI that manages client requests and dispatches them to appropriate threads to handle the requests. PyDAP manages the basic components of the OPeNDAP request, such as identifying request identification, dataset identification, URL parsing etc. The PyDAP Coverage Handler, which is a custom extension, manages the response to the OPeNDAP requests by opening a coverage model instance and querying the coverage about data and metadata. Once the query has been answered, the coverage handler formats the data into a PyDAP data structure and returns the structure to PyDAP where the structure is translated into a raw OPeNDAP response and sent to the client.

PyDAP doesn't have a service definition, it runs completely outside the scope of the capability containers and should not interface directly with any container services, attributes or CI system. It only interfaces with the coverage instances stored on the filesystem available to the host.

Cataloging Data

Once a data product is activated, a registration process adds a reference to that data product in a datasets.xml catalog that is shared between ERDDAP and the CI system. datasets.xml hosts references to the PyDAP URLs, descriptors and attributes about the data, basic provenance and licensing of the data as well as dataset-specific attributes such as reload times. Generally the catalog is stored on a shared filesystem. When the registration process is first started it initializes the datasets.xml to a blank but well-defined catalog so that ERDDAP and the sytem recognize that there are no data. Each dataset entry in the catalog specifies a unique identifier for the dataset, which directly corresponds to the resource identifier for the data product. A listing of available variables in the data and a desired datatype that ERDDAP can attempt to convert to. Some special data types, like dates and times, have special ERDDAP attributes that can be used. A detailed description of the capabilties of the catalog are described below. [1]

[1] Working with the datasets.xml File

OPeNDAP Request Handling

Data Attribute Structure

The "Dataset Attribute Structure" (DAS) is used to store attributes for variables in the dataset. An attribute is any piece of information about a variable that the creator wants to bind with that variable excluding the type and size, which are part of the DDS. Typical attributes might range from error measurements to text describing how the data was collected or processed.
In principle, attributes are not processed by software, other than to be displayed. However, many systems rely on attributes to store extra information that is necessary to perform certain manipulations of data. In effect, attributes are used to store information that is used "by convention" rather than "by design". OPeNDAP can effectively support these conventions by passing the attributes from data set to user program via the DAS. (Of course, OPeNDAP cannot enforce conventions in datasets where they were not followed in the first place.0
The syntax for attributes in a DAS is given in the table below. Every attribute of a variable is a triple: attribute name, type and value. The name of an attribute is an identifier, consisting of alphanumeric characters, plus "_" and "/". The type of an attribute may be one of: "Byte", "Int32", "UInt32", "Float64", "String" or "Url". An attribute may be scalar or vector. In the latter case the values of the vector are separated by commas (,) in the textual representation of the DAS.

Dataset Descriptor Structure

In order to translate data from one data model into another, OPeNDAP must have some knowledge about the types of the variables, and their semantics, that comprise a given data set. It must also know something about the relations of those variables—even those relations which are only implicit in the dataset's own API. This knowledge about the dataset's structure is contained in a text description of the dataset called the "Dataset Description Structure" (DDS).

The DDS does not describe how the information in the data set is physically stored, nor does it describe how the "native" API is used to access that data. Those pieces of information are contained in the API itself and in the OPeNDAP server, respectively. The DDS contains knowledge about the dataset variables and the interrelations of those variables. The server uses the DDS to describe the structure of a particular dataset to a client.

The DDS is a textual description of the variables and their classes that make up some data set. The DDS syntax is based on the variable declaration and definition syntax of C and C++. A variable that is a member of one of the base type classes is declared by writing the class name followed by the variable name. The type constructor classes are declared using C's brace notation. A grammar for the syntax is given in the table below. (Note that the Dataset keyword has the same syntactic function as Structure but is used for the specific job of enclosing the entire data set even when it does not technically need an enclosing element.)

Data Transmission

An OPeNDAP server returns data to a client in response to a request URL composed of the root URL, with the suffix ".dods". For example, if a data set is located at http:/tests.opendap.org/data/mydata.dat then you'll find the data at http:/tests.opendap.org/data/mydata.dat.dods
The data is returned in a MIME document that consists of two parts: the DDS, and the data encoded according to the description in External Data Representation. (The returned document is sometimes called the DataDDS.) The two parts are separated by this string:
Data:<CR><NL>
The DDS included is modified according to any constraint expression that may have been applied. That is, the returned DDS describes the returned data.
For example, consider a a request for data from a data set with a DDS like this:

This is the DDS of a typical gridded dataset. Suppose, though, that you ask for only the time values of the data set. The DDS of the result will look like this:

This DDS will be included in the DataDDS return, ahead of the encoded array of 1857 64-bit time values.
For more information about sampling OPeNDAP data sets, see the section below about constraint expressions.

A request for data from an OPeNDAP client will generally make three different service requests, for data attributes (DAS), data descriptors (DDS), and for data. The prepackaged OPeNDAP clients do this for you, so you may not be aware that three requests
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.