Skip to end of metadata
Go to start of metadata
PostgreSQL is the datastore backend technology for R2 and R3 OOI Network, replacing CouchDB and ElasticSearch as of December 2013. Through the Pyon datastore layer it is interface compatible with the CouchDB backend. This page describes the integration of PostgreSQL into the OOI Network and provides installation, configuration and use instructions.

Summary

PostgreSQL and PostGIS

PostgreSQL is a full featured high performance SQL database with proven scalability properties, an over 20 year development history and lots of operational support. It provides a JSON datatype and advanced indexing capabilities. Python drivers are based on an efficient C library for high performance database access.

PostGIS is a PostgreSQL extension (see here) that provides advanced geospatial search and manipulation.

This page was written Dev 2013 for PostgreSQL 9.3.x on MacOS. In case of installation trouble, see Python and Virtualenv Mac Troubleshooting.

Developers: How to skip this update and keep CouchDB (for now)

Postgres will be the default datastore for the OOINet starting Dec-20 2013. The code supports both Postgres and CouchDB/ElasticSearch options, but postgres will be the default. If you want to continue running against CouchDB, please add these lines to your pyon.local.yml:

Please make sure you don't have 2 container: entries in your pyon.local.yml. Otherwise one of them will be ignored and mysterious behavior will result!

Everything else stays the same and code and tests run like before.

Postgres Installation and Basic Usage (Mac OS-X)

Please follow the steps described below and adjust as necessary if you already have a Postgres installation or are not installing on a Mac. This has been tested with MacOS 10.8 with latest brew and XCode. Installation for Linux is also possible but not described here.

Please make sure your brew and system environment are up to date. Please meticulously follow brew doctor's hints. Even if they sound not related, they mostly are correct and indicate problems with your installation.

PostgreSQL and PostGIS Installation

As of January 2014, the following versions should be installed:

Please make sure you run a postgresql 9.3.x version. If you had a prior postgres installation, upgrade to 9.3.x and delete any prior databases.

For more PostGIS info and in case of trouble with the installation see also:

Create PostgreSQL databases and users

Create a default "postgres" database:

Start the database in a different shell window:

Starting the database server can and should be automated by adding postgres to your supervisord.conf, if you are using this tool (recommended). See below for supervisord instructions

Create a second user "ion" required by the OOINet:

Check that everything works by executing:

Update coi-services dependencies

New dependencies such as the postgres driver library are required and code has changed directories. Please run the following after pulling in the code that has the postgres updates:

This completes all steps necessary by initial users. bin/pycc and all unit and integration tests should work at this point similar to before.

Try the postgres container by running:

The log output should indicate connection to the postgres server, creation of a database and then result in the normal container shell prompt with no other errors. Logging in to postgres should show some of the tables created:

Optional: Postgres admin UI tool (Mac)

Postgres comes with the psql command line client shown above. A powerful GUI admin tool can be downloaded here: http://www.pgadmin.org/

In order to connect to your localhost PostgreSQL instance, just add a new connection in the tool to localhost with your loginname as username and empty password (these are the brew install defaults)

The PostgreSQL default installation is configured to only accept localhost connections. For configuration how to make the database server accessible via the network, please see below.

Postgres Installation and Basic Usage (CentOS 6)

Execute the following commands

Contents of pg_hba.conf:

Advanced Configuration and Use

The following advanced steps should not be required for OOI Network development.

Postgres Usage

Server Use

Start Postgres server in foreground mode (Ctrl-C to exit):

Start/stop Postgres server as a deamon manually (on Mac):

Connect:

The PostgreSQL default installation is configured to only accept localhost connections. For configuration how to make the database server accessible via the network, please see below.

Use Supervisord to automatically start Postgres as a deamon

See the Container Use Guide for details about supervisord and how to configure it. This is VERY recommended

Add the following snippet to your /etc/supervisord.conf (change settings as needed):

To update the running supervisord instance execute:

Go to http://localhost:9001/ to verify running processes and check for errors

Enabling remote access to server (if needed)

  • Add line "host all all 10.10.29.0/24 trust" to file /usr/local/var/postgres/pg_hba.conf (correct IP)
  • Uncomment/set "listen_addresses='*'" in file /usr/local/var/postgres/postgresql.conf
  • Restart server

See http://www.cyberciti.biz/tips/postgres-allow-remote-access-tcp-connection.html

Common SQL Queries

For common queries (select statements) and other database operations, see Postgres SQL Snippets

Postgres Configuration

The following is for advanced settings and configuration only and is not required for initial use.

Directories/Files on a Mac:

Directory/File Description
/usr/local/Cellar/postgresql/9.n.n Binaries placed by brew install for a Postgres installation
/usr/local/var/postgres Postgres database, can be generated using initdb tool
/usr/local/var/postgres/server.log Postgres server logfile
/usr/local/var/postgres/pg_hba.conf Remote access configuration file
/usr/local/var/postgres/postgresql.conf Postgres configuration file. For tuning, see below

Configuration settings in postgresql.conf:

Changing the postgres config file is not required for initial development. For more a more performant database configuration, edit the file and enter these values:

Setting Suggested Value Description
max_connections 100 Max number of open connections. For a developer system, this is not relevant
shared_buffers 512MB Memory for caching blocks. Should be set to 1/4 physical memory on a dedicated system and less on a developer machine.
work_mem 5MB Memory per join/sort
maintenance_work_mem 64MB Memory reserved for vacuum
synchronous_commit off Await a full fsync on disk writes. Can be off for a developer machine. The risk is minimal data loss never data integrity. Do not switch fsync=off which risks database integrity.
checkpoint_segments 32  
checkpoint_completion_target 0.9  
random_page_cost 2.0 Instructs query planner to use a lower lookup cost for blocks (e.g. with SSDs)
effective_cache_size 1024MB Hints the query planner to guess the memory for database cache (shared buffers) and OS disk cache. Up to 1/2 of physical memory

An example postgresql.conf for a well equipped MacBookPro is attached here.

OOI Network Integration

Pyon Datastore implementation

  • We use Python psycopg2 2.5 library
  • Code is split into a postgresql/base_store.py with all Pyon/ION independent datastore access and postgresql/datastore.py with ION specific extensions. 
  • postgresql/pg_util contains common utilities, such as a connection pool, tracing connections and cursors for query logging and a common connection function.
  • Psycopg2's cursor.execute(statement, arg_list_or_dict) does encoding and escaping of value arguments provided in the statement with %s or %(name)s notation. It does NOT handle database, table or column names. That's why there is a mix of python string concatenation and Psycopg2 argument replacement going on. It is not possible to use the Python % format operator, because it would conflict with the Psycopg2 argument replacement notation
  • We concatenate our own SQL statements instead of using SQLAlchemy. SQL is already a public interface language, here in the Postgres dialect. The use of the SQLAlchemy makes code even more complex and currently there is no need for RDBMS portability. All SQL is located within 2 modules and can be replaced when desired
  • The sysname handling is is follows. 
    • The Postgres database itself is prefixed with the sysname, e.g. sysname_ion. A default sysname stars with ion, leading to a database name of ion_hostname_ion
  • The interface for application level code to use efficient datastore queries is to either use a specialized RR function (e.g. find_resources_ext, find_objects) or to use find_by_view and then process the view results. Postgres does not have views as the ones defined in Couch. Therefore, the postgres backend takes the parameters for reading the view, translates them into a SQL query and then assembles the view contents. This may be a bit tricky because of the different ways in which a view can be used (key query, multi key query, range query, sorted result expectation, pagination) and there are case decisions in the code but the actual mapping is straightforward
  • The datastore uses a container global connection pool. The maximum number of concurrent connections can be configured.

See also:

Database Layout

  • There is one Postgres database created for each sysname. The database name is prefixed with the sysname, e.g. sysname_ion. A default sysname starts with ion, leading to a database name of e.g. "ion_hostname_ion"
  • The database requires an admin user (role) and an application user. The admin user, typically the user login name on developer machines or "ooiuser" on production machines, creates the database, tables and functions. The application user, typically "ion", performs all the CRUD operations with data and is limited to these operations.
  • Database SQL scripts are located in the ion-definitions git repository (submodule) under "res/datastore/postgresql"
    • Common database extensions and function definitions are located in the db_init.sql script and executed once on the creation of a new database
    • Datastore profile specific scripts are called "profile_name.sql" where name is the name of a datastore profile (e.g. resources, objects, events, state, basic)
  • Tables and indexes are prefixed with "ion_"
  • Logical pyon datastores are represented as a set of tables and indexes with a common prefix and a behavior defining profile. 
    • A profile initialization script is executed (defaulting to basic), creating different tables and indexes. E.g. the datastore "resources" with profile "resources" leads to the execution of the SQL script "profile_resources.sql" and creates tables "ion_resources", and others prefixed with ion_resources.
    • The number of datastores is not limited
  • Indexes are prefixed with the table name they apply to and are suffixed with "_idx"
  • Referential integrity is used that resource associations and attachments cannot be present if resource is deleted. Cascade delete is defined in the schema.

Services Integration

  • The container creates and provides access to a set of datastores via higher level interfaces
    • "resources" for the resource registry, via container.resource_registry
    • "objects" for the default object store, via container.object_store
    • "events" for the event repository, via container.event_repository
    • "state" for the process state repository, via container.state_repository
    • Services may create other datastores, as needed
  • The datastore interface is the same for all datastore types, e.g. couchdb and postgresql. A common DatastoreFactory provides a means to request a datastore based on the configured default datastore.
  • The container default datastore is set via the CFG property "container.datastore.default_server". An entry must exist under CFG "server.servername" for the given type of datastore.
  • Running the with a PostgreSQL datastore does not require ElasticSearch as a search index. It should be configured as disabled in the CFG via "bootstrap.use_es=False"
  • The discovery service is aware of the Postgres datastore and will perform searches via the appropriate postgres datastore instead of via ElasticSearch. It supports both search strategies and will operate based on system configuration. 

Documentation

PostgreSQL

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.