Skip to end of metadata
Go to start of metadata
This page describes Postgres integration with an advanced query search engine

Summary

The needs are:

  • The OOI Network UI makes queries to the Discovery service to resolve advanced queries for user information. Examples of these queries are
    • Search by resource type
    • Search by content of any resource attribute
    • Search by geospatial properties of a resource (location + vertical)
    • Search by temporal properties
    • Combination of search filters with AND and OR
    • Search for events
  • Discovery service needs a flexible search engine to map queries to. The search engine needs to have suitable indexes defined.

Implementation Details

Configuration for couch based system with ElasticSearch

These configuration settings must be active in the container configuration:

Bootstrap Sequence

During system bootstrap, indexes are defined in the bootstrap-index plugin. Some code is in the Index Management Service, utilizing the elasticpy library to make RESTful HTTP calls to ElasticSearch.

Supporting services:

Discovery Service Use in R2

From the UI

From other places

PostGIS Search

Basics

  • PostGIS is an open source software program that adds support for geographic objects to the PostgreSQL object-relational database. PostGIS follows the Simple Features for SQL specification from the Open Geospatial Consortium (OGC).
  • PostGIS is a plugin to PostgreSQL.
  • It defines new column data type geometry and functions to query for values of this data type

Installation

See M112 Geospatial Search for additional installation information and instructions in case of some install errors

To verify that it was installed correctly - start PostgreSQL first:

References

Discovery with Postgres and without ElasticSearch

Investigations

Postgres - ElasticSearch Integration

Installation

Installation of JDBC river and JDBC driver for Postgres. Apparently JDBC driver 9.2 is compatible with Postgres 9.3

Use

References

Postgres - ElasticSearch River

  • The ElasticSearch river is a JDBC plugin. It defines a SQL to get the data, a query strategy and a time interval to repeat the query
  • Strategy for data load
    • Full table load (oneshot or simple)
    • A handshake is possible to mark all columns that have been indexed with a flag and then only select new entries

Possibilities:

  • Create a view for each ES index that can be queried in order or with ack
  • Don't use ES and go directly against PG in discovery service

Findings:

  • The JDBC river is very limited in that it expects SQL result sets with named columns to contain the attributes of the indexed document
  • Tried various ways of getting the JSON content into indexable form
    • Could get resources table into ES but doc is a json string that cannot be accessed 
      • select id,name,ts_created,lcstate,doc:varchar from ion_sterling_resources
    • ES update functions: must be called manually and cannot parse JSON
      • Scripting launguage is mvel but others are supported
    • Structured Objects: return multiple rows with same _id with key-value pairs. Not supported by river
      • with rt as (select id as _id, json_each_text(doc) as attr from ion_sterling_resources) select rt._id, (attr).key, (attr).value from rt
    • No script possibility within JDBC river
    • JDBC river does not support the json data type in PG
  • No success - giving up

Notes: http://etherpad.oceanobservatories.org/r3elasticsearch

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.