Skip to end of metadata
Go to start of metadata

See the Release 3 Milestone Summary page.

WBS 1.2.3.21.40.01
ID M112
Description Capability extensions to index and query datasets by geospatial metadata beyond geospatial points supported in R2
Deliverable ID D053
Deliverable Data Access Service
Development Owner M. Manning
Development Team Data Processing
Developers B. McKenna
Release R3
Status Done
Start Date 1/6/2014
End Date 4/2014
Comments  

Milestone Scoping and Requirements

Requirements and Capabilities Tracing

The requirements in this section come from the requirements spreadsheet finalized and agreed to between the CI Development Manager and CI System Engineer. That document is M112 Requirements Spreadsheet

The following table contains all of the Milestone Service Requirements
Reqt ID Requirement Text Rationale and Description Proposed Change  
L4-CI-DM-RQ-124 Search & Navigation      
L4-CI-DM-RQ-219 Search and navigation shall implement resource discovery through complex query  For example, resource type and a specific metadata attribute    
L4-CI-DM-RQ-91 Search and navigation shall implement geospatial query for resources For example, by specifying a lat/lon/depth box    
L4-CI-DM-RQ-194 Search and navigation shall combine resource discovery with resource access Subject to policy    
NEW DM-1     Search and Navigation Geospatial Query shall support bounding box spatial searches using the following spatial query operators from the OGC Reference Model:  a. overlaps; b. contains;  c. within; and d. disjoint. [https://github.com/ooici/coi-services/blob/master/ion/services/dm/presentation/test/discovery_test.py#L332-L421
]
NEW DM-3   applies to all types of searches, including geospatial and temporal. Search and Navigation shall provide a count of the result-set from a search in near real-time. [https://github.com/ooici/coi-services/blob/master/ion/services/dm/presentation/test/discovery_test.py#L310-L321
]
The following table contains the Milestone UX requirements
Reqt ID Requirement Text Proposed Change
L4 UX RQ 117 A user interface for search and navigation shall be provided  
L4 UX RQ 119 The search and navigation user interface shall present all supported query options  
NEW UX-1   A user interface for geospatial search shall be provided.
NEW   The Geospatial Search user interface shall allow a user to search for geolocated data products by specifying the latitude, longitude, and depth at which the data were collected.
NEW   The Geospatial Search user interface shall allow a user to search for deployed marine infrastructure resources by specifying the latitude, longitude, and depth at which the resource is deployed.
NEW   The Geospatial Search user interface shall allow a user to specify depth by entering or selecting values, units, and whether to measure up or down from from the ocean floor or the ocean surface.
NEW   The Geospatial Search user interface shall allow a user to specify a  latitude and longitude bounding box  by entering values of north and south latitude and east and west longitude.
NEW   The Geospatial Search user interface shall allow a user to specify a latitude and longitude bounding box by drawing the bounding box on a map.
NEW   The Geospatial Search user interface shall allow a user to perform bounding box spatial searches using the following spatial query  operators from the OGC Reference Model:  a. overlaps; b. contains;  c. within; and d. disjoint.
NEW other search types include temporal search, search by name, and search by type The Geospatial Search user interface shall allow a user to combine a geospatial search with one or more other search types, by "and'ing" the geospatial search criteria with the search criteria from the other search type(s).
NEW   The Geospatial Search user interface shall provide contextual help.
NEW   The Geospatial Search user interface shall provide help documentation.
NEW   The search and navigation user interface shall allow a user to limit the number of results returned from a search query.
NEW   The search and navigation user interface shall present to  a user the number of results returned from a search query.

Use Cases

Find resources using OGC Reference Model Geospatial Queries

  • Overlaps
    • A user may select "overlaps" as the query operator. The result set are all resources where the resource's geospatial region intersects with the query bounding box.
  • Within
    • A user may select "within" as the query operator. The result set are all resources where the resource's entire geospatial region is wholly contained within the query bounding box.
  • Contains
    • A user may select "contains" as the query operator. The result set are all resources where the query bounding box is wholly contained within the resource's geospatial region.
  • Disjoint
    • A user may select "disjoint" as the query operator. The result set are all resources that contain a geospatial region that do not intersect with the query box.
Find data within bounding area that meets the additional condition of a variable (e.g., temperature) within a given range
This use case is under consideration.
  • Resources can be identified using geospatial search with data conditions.
    • A user can specify a range for a data variable and the results will include data products that intersect the range.
Advanced: Find out how many entities meet the search criteria.
  • Provide the UI with a count of the result-set from a query in near real-time.

Related Search issues in Jira

Steps

Milestone Testing

For skip (offset). First we create a large number of resources then we test using the QueryLanguage and the Discovery Intermediate Format

For the count of total results in database, should return 1 value

For the enhanced geospatial, we run two sets of tests:

The first uses a WKT Polygon object to represent a box and test the overlaps, contains and within operators.

The second demonstrates the ability to specify a latitude/longitude point and an associated buffer (AKA radius) and search relative to the circle formed by that radius.

Milestone Pull Requests

Additional search parameters (skip) and query for total results in DB

Milestones - Tasks - Actions addressed

  • M112-T131-A002: Return number of total results available in datastore from DiscoveryService
  • M112-T131-A005: DiscoveryService processes an offset (skip) parameter for pagination

DiscoveryService currently supports a 'limit' parameter, allowing the client to specify how many results to return from server. An additional 'skip' parameter is implemented to allow the client to request the next set of n results, skipping the first m results. This addition will allow the UI to implement pagination. For example, to get results 501 to 600 from the server:

 {'limit': 100, 'skip': 500}

DiscoveryService can now also be used to obtain the 'total number of results available' in the datastore. Using the standard query object and adding

 search_args={'count': True}

will return an array containing a single integer value, the total number of results available in the datastore. This call should be followed (or preceded) by the same query object without the {'count':True} to obtain the actual results with the limit and/or offset applied.

Implementation of WKT string for geospatial query (with optional buffer specified), this supports the point with radius search

Milestones - Tasks - Actions addressed

  • M112-T133-A001: Implement WKT qmatcher in ds_discovery
  • M112-T133-A002: Pyon DatastoreQuery language support for WKT
  • M112-T133-A003: Pyon PostgresQuery language support for WKT

DiscoveryService query can now recognize a request of the form:

 {'wkt': <WKT_STRING>, 'buffer': <decimal_degrees>}  

This capability supports the requirement for point based (with radius) search to be made available in the UI. The point must be encoded in WKT format, eg. POINT(10.0, 10.0) and the associated buffer parameter will be used for the radius. This geospatial object (point with buffer) is passed directly to the PostGIS layer making it the most efficient and optimal to reference a circle against the stored locations in the database

Update geospatial Device records upon deployment

Milestones - Tasks - Actions addressed

  • M112-T132-A001 - Upon deployment, updates Devices with Site and Deployment attributes

Milestone User Interface

M112 - Wireframe v1 (030114)

M112 - Wireframe v2 (031714)

Milestone Tasks

Task Description
Identify Geospatial search requirements Analyze existing requirements and propose revisions for milestone work. Communicate with Marine IOs and other stakeholders to determine specific detailed features and analyze available documentation as needed. Work with the System Engineer on a requirements revision proposal.
Design Geospatial search behavior model Develop detailed designs of this milestone's capabilities as needed for subsequent implementation and integration with the production system and other components. Identify core interfaces and dependencies to the system and to other compoents. Describe core interfaces provided. Get review from system architect and make design artifacts available in the CI architecture documentation.
Define enhanced geospatial indexes in database Implement database indexes as designed and scoped using the integrated database technology. Develop tests to demonstrate the correct operation of the indexes.
Enhance discovery to use enhanced indexes Implement the capability as designed and scoped. Develop unit and integration tests to demonstrate the correct operation of the code.
Enhance resource attributes for geospatial resources Implement the capability as designed and scoped. Develop unit and integration tests to demonstrate the correct operation of the code.
Enhance business logic for geospatial resources Implement the capability as designed and scoped. Develop unit and integration tests to demonstrate the correct operation of the code.
Integrate and test with production environment Take all developed software capabilities of this milestone and integrate them with the remainder of the system. Demonstrate the correct function of the additions through successful automatic tests running against a fully launched system and by interactive demonstration on the test/alpha system.
Add Spatial Operator (view) (ion-ux) Add button group to Advanced Search "GEOSPATIAL BOUNDS" form: ('spatial_operator' options - overlap/intersects,within,contains,disjoint) [UI task]
Add Spatial Operator (controller) (ion-ux) Create spatial_operator key in service API to pass to discovery service
Add Spatial Operator (service)
(coi-services) Add spatial_operator parameter to discovery service [_qmatcher_geo_loc]
User Defined Limits (view) (ion-ux) Add form dropdown for number of desired results to return from search (eg. 100,200,500) [UI task]
User Defined Limits (controller)
(ion-ux) Handle 'limit' field in service API (limit currently set in code not user option)
Return number of total results (view) (ion-ux) Display total number of search results available in DB. eg. showing 0-10 of 100 (14,567 available) [UI task]
Return number of total results (service)
(coi-services) Return total results available in DB from discovery service (beyond specified limit)
Search Offset (view) (ion-ux) Add "next n button/link below search result navigation to get next set of results past limit. eg. showing 91-100 of 100 (click to retrieve next 101-200) [UI task]
Search Offset (controller) (ion-ux) Create 'offset' key with value in service API to pass to discovery service
Search Offset (service) (coi-services) Process an offset parameter in discovery service to pass to Postgres OFFSET value (allows search to skip n records)

Milestone Design

Identify Geospatial search requirements

Analyze existing requirements and propose revisions for milestone work. Communicate with Marine IOs and other stakeholders to determine specific detailed features and analyze available documentation as needed. Work with the System Engineer on a requirements revision proposal.

Design Geospatial search behavior model

Develop detailed designs of this milestone's capabilities as needed for subsequent implementation and integration with the production system and other components. Identify core interfaces and dependencies to the system and to other components. Describe core interfaces provided. Get review from system architect and make design artifacts available in the CI architecture documentation.

Define geospatial capabilities in database

Implement database indexes as designed and scoped using the integrated database technology. Develop tests to demonstrate the correct operation of the indexes.

Consider OpenGEO Indexing Tutorial. For tables in PostGIS that will have geospatial support, consideration for how to index and creating indexes will need to be designed and implemented. This logic is probably best suited for wherever the CREATE TABLE logic is implemented. A simple scan of the resource fields to identify any fields that are geometries should suffice, and then add an index to the database.

The new resource registry postgres implementation supports and fills 4 geometry/temporal columns:
  • geom: the geospatial center point
  • geom_loc: the area bounding box for the resource
  • vertical_range: the vertical range for the resource - postgres numrange type
  • temporal_range: the temporal range for the resource - postgres numrange type

So besides the point queries, we now also support intersect, overlap and containment queries against a resources bbox.

The geom colum is filled from the geospatial_point_center attribute, the geom_loc and vertical_range columns are filled based on the constraint_list and the north/south/east/west and depth min/max coordinate values.

See more details here: https://confluence.oceanobservatories.org/display/CIDev/Postgres+Datastore

Enhance discovery to use geospatial information in data store

Implement the capability as designed and scoped. Develop unit and integration tests to demonstrate the correct operation of the code.

Documented example queries here. Please feel free to add to this list:
https://confluence.oceanobservatories.org/display/CIDev/Postgres+SQL+Snippets

Indexes are inherently used when available in PostgreSQL, without an index a brute-force or exhaustive search is used.

h7. The way to query geospatial via the discovery service is this code:
https://github.com/ooici/coi-services/blob/master/ion/services/dm/presentation/discovery_service.py#L1128-L1130
https://github.com/ooici/pyon/blob/master/pyon/datastore/datastore_query.py#L137-L150
https://github.com/ooici/pyon/blob/master/pyon/datastore/postgresql/pg_query.py#L64-L75

Enhance business logic for geospatial resources

Implement the capability as designed and scoped. Develop unit and integration tests to demonstrate the correct operation of the code.

We will define a resource type hierarchy that supports geometries that are intended to be geospatially indexed.

  • Geometry
  • Point
  • Circle
  • Square
  • Polygon

A Resource that intends to have a field or subset of fields that are geospatially indexed will include a field that is of a geometric type:

h7. See here for the code that fills the geom* columns in the resource registry:
https://github.com/ooici/pyon/blob/master/pyon/datastore/postgresql/base_store.py#L469
https://github.com/ooici/pyon/blob/master/pyon/datastore/postgresql/base_store.py#L369-L437
Please discuss any modifications here with MMEisinger.

The resource registry will need to be modified so when the tables are created and a field of type Geometry is created an appropriate PostGIS data type is selected and a proper index is created to geospatially index the resource.

We will refactor the existing discovery code to use PostGIS capabilities for search and navigation as well as geospatial search. We will expose GIS searching capabilities through discovery service.

Integrate and test with production environment

Take all developed software capabilities of this milestone and integrate them with the remainder of the system. Demonstrate the correct function of the additions through successful automatic tests running against a fully launched system and by interactive demonstration on the test/alpha system.

Design References and Context

Design Notes

R3 ElasticSearch Design etherpad

includes postgis notes

http://etherpad.oceanobservatories.org/r3elasticsearch

PostGIS and Location Aware Resources

After our migration efforts for milestone M166 PostgreSQL data store, we should be able to leverage the featureset of PostGIS to provide OOIN and clients with geospatial awareness for all system resoures that have a geospatial identity. Once PostGIS is installed and the PostgreSQL database has the GIS extension installed then extended resources to include GIS aware objects is simple.

GIS Objects
The GIS objects supported by PostGIS are a superset of the "Simple Features" defined by the OpenGIS Consortium (OGC). As of version 0.9, PostGIS supports all the objects and functions specified in the OGC "Simple Features for SQL" specification.

PostGIS extends the standard with support for 3DZ,3DM and 4D coordinates.

The OpenGIS specification defines two standard ways of expressing spatial objects: the Well-Known Text (WKT) form and the Well-Known Binary (WKB) form. Both WKT and WKB include information about the type of the object and the coordinates which form the object.

Examples of the text representations (WKT) of the spatial objects of the features are as follows:

  • POINT(0 0)
  • LINESTRING(0 0,1 1,1 2)
  • POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))
  • MULTIPOINT(0 0,1 2)
  • MULTILINESTRING((0 0,1 1,1 2),(2 3,3 2,5 4))
  • MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1)))
  • GEOMETRYCOLLECTION(POINT(2 3),LINESTRING(2 3,3 4))

The database provides the capability to query against spatial relationships. With standard geometrical relationships: contains, within, touches, etc.

Here is a quick SQL example of the geospatial capabilities:

PostGIS also supports parsers for standard industry shapefiles including KMZ, ESRI Shape files etc. This may play a role if we provide users with the capability of inputing system resources and defining shape boundaries for the resources.

Application Resources that are geo-aware

Resource Location type Notes
Observatory point or polygon  
PlatformSite point  
InstrumentSite point  
DataProduct point or polygon polygon for glider
Deployment point or polygon  
Site point or polygon  

Application Resources that are geo-searchable

Add geospatial and temporal attributes to the Device resource

Implementation Notes

Discussion Notes

Discussion Tuesday October 29th,

These are the steps I took in order to install postgis on a near-fresh machine:

The prerequisites is that python2.7 is installed via brew

To verify that it was installed correctly:

Discussion Tuesday 31 October

Prototype using a single column in the resource table to contain the geodata (each row represents a single resource)

  • define the types of geometries required to represent various resource types: point, rectangle
  • if a resource noes not have a geo-location then simple leave as null
  • queries should be a standard PostGIS select and efficient:
    • find all instrument devices of model CTDSMP37 in this rectangle
  • OGC externalization plans are next phase
    • see how much of the standard we can support with the above simple model.

Initial Prototyping

1 Nov

MMeisinger

All, you can now try out the Postgres resource registry branch. It is ready to use for initial investigations and for call tracing. It works with the full demo of R2 alpha preload, UI and streaming except for discovery service/ES integration. No changes to coi-services required, other than change pyon and ion-definitions submodules, install postgresql and driver and add a bit of pyon.local.yml:
https://confluence.oceanobservatories.org/display/CIDev/Postgres+Datastore (see at bottom)
It's very easy to use and you can switch back and forth coi-services master and coi-services postgres_merge branch without issues. You don't even have to change pyon.local.yml

I've added this and other information to the "central" Postgres page on Confluence:
https://confluence.oceanobservatories.org/display/CIDev/Postgres+Datastore

I just enhanced the Postgres datastore to set a geometry column (currently based on the geospatial_point_center value). This works nicely for the BETA preload:

Then I tried a bounding box query:

It seems to work. An arbitrary number of extensions are thinkable

LCampbell
If you want to add PostgreSQL to your supervisor config scripts so that it's managed as a daemon by supervisor:

Status Discussion 31 Jan with Luke, Brian, Michael, Tim and Maurice

  • Issues working thru the several layers of discovery search. Brian feels he has a handle on it now.
  • Need to focus on a full suite of integration tests that demonstrate various search types ( overlap, bounding box, etc) for multiple search types
    • There are 3 weeks allocated to integration with the UI, this is the time that will test UI-created searches
  • Must define which application level resources needed for search
    • which need location attributes
    • which would be found by searching for an associated resource and how would that work
      • A device does not have location attributes but it is deployed to a site which does have location. Search for all devices in a bounding box.
    • Temporal and geo search scenarios?

Geospatially indexed types email thread. MMeisinger 4 Feb

I suggest to run ooi_beta.yml preload for a full set of resources to investigate for search. Hint: you can run it once and then just restart the system after code changes without a new reload.

(1) Current Situation

Currently we have geospatial information for these resources:

  • Observatory - only geospatial
  • PlatformSite - only geospatial
  • InstrumentSite - only geospatial
  • DataProduct

Site is an abstract base type and will never exist as a resource in the system. Subsite is not used for OOI deployments.

Currently these resource types have temporal information:

  • Deployment - only temporal
  • DataProduct

(2) Desirable information

  • Deployment is associated with a Site via a hasDeployment association. This means that the Deployment has access to a unique set of geospatial metadata and could be queried this way, even historically or for the future.
  • Devices when deployed as primary are associated with a site via a hasDevice association. This means that the Device has access to a unique set of geospatial metadata and could be queried this way for certain epochs (note: not historically)
  • Devices obviously have a physical life span (date of purchase until date of decommissioning/retiring). This information could be added as metadata with an easy change but it's not currently in the YML/preload
  • DataProducts have metadata that is manually set. Obviously it may be desirable to update this metadata recurringly (e.g. in a batch process) every so often given the actual coverage. The question is semantical: The DataProduct metadata has the nominal temporal range inside, not the actual. The actual data may have gaps so you are always dealing with bounding boxes, not with exact matches.
  • It may be desirable to issue historic queries: Find all devices that were deployed at area1 in historic temporal range. We have this information through Deployment resources but the query may be a bit more complicated
  • Dataset resources may be interesting too, but maybe less so to the user
  • There is a dependency between the derived DataProducts for a device (e.g. L1) and the parsed DataProduct. These would need to be kept in sync
  • There will be additional complication with site DataProducts that have different device associations that are vaild for certain time periods only.

(3) Next steps

I believe we should discuss these directions:

a: Can we enhance the resource metadata for more resource types?

b: Can we define processes that update resource metadata recurringly to make it more reliable? Or on certain service calls, e.g. activate_deployment

c: Can we extend the discovery service / datastore query classes with abstracted SQL queries that use associations to find more resources - careful here.

d: What are the user expectations for searching and finding, for navigating resource trees and for maintaining resource metadata and do we meet these right now.

Design and planning discussion. MMeisinger, TAmpe, MManning 11 Feb

Given the remaining time on this milestone we will need to provide a trimmed set of geosearch capabilities that still allows the user to locate critical resources.

  • Add geospatial and temporal attributes to the Device resource
    • Deployment activation/deactivation processing can assign the location from the deployed site and the temporal extent from the deployment resource to the device.
  • Data Product resource geospatial and temporal information will be updated via batch processing.

Note that currently in preload OOI sites have geospatial information but do not have temporal rage. Non-OOI sites currently have both geospatial and temporal information. Ideally this could be align with a script is time allows.

MMeisinger has defined an approach to support circle and polygon shaped bounding boxes in Discovery and will implement this near the end of this milestone delivery date.

Labels

r3milestone r3milestone Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.
  1. Jan 31, 2014

    Michael Meisinger says:

    Assuming the search is implemented here some pointers for expanding this milesto...

    Assuming the search is implemented here some pointers for expanding this milestone in case

    1. Finding more resource types: Only few resource types have geospatial-temporal metadata in them. Do we expect to find more resource types (e.g. devices) for current deployments. If yes, geospat-temp metadata could be added to new resource types, e.g. automatically on deployment or in the YML model; or resources could be found by association, or metadata could be taken/updated from the coverage model.

    2. Finding more accurately: Is what the datastore places in the 4 geospat-temp columns correct? Do we need to refine this algorithm?

    3. Are the operators the search API provides sufficient? Probably they are but maybe there need to be extensions for by association etc.

    4. Do we need automated processes that extract geospat-temp metadata from coverage or events and updates resource metadata?