View Source

{info}PostgreSQL is the datastore backend technology for R2 and R3 OOI Network, replacing CouchDB and ElasticSearch as of December 2013. Through the Pyon datastore layer it is interface compatible with the CouchDB backend. This page describes the integration of PostgreSQL into the OOI Network and provides installation, configuration and use instructions.{info}

h2. Summary

h3. PostgreSQL and PostGIS

PostgreSQL is a full featured high performance SQL database with proven scalability properties, an over 20 year development history and lots of operational support. It provides a JSON datatype and advanced indexing capabilities. Python drivers are based on an efficient C library for high performance database access.

PostGIS is a PostgreSQL extension ([see here|http://postgis.net/]) that provides advanced geospatial search and manipulation.

{tip}
This page was written Dev 2013 for PostgreSQL 9.3.x on MacOS. In case of installation trouble, see [CIDev:Python and Virtualenv Mac Troubleshooting].
{tip}

h3. Developers: How to skip this update and keep CouchDB (for now)

Postgres will be the default datastore for the OOINet starting Dec-20 2013. The code supports both Postgres and CouchDB/ElasticSearch options, but postgres will be the default. If you want to continue running against CouchDB, please add these lines to your pyon.local.yml:

{code}
container:
datastore:
default_server: couchdb
{code}

Please make sure you don't have 2 container: entries in your pyon.local.yml. Otherwise one of them will be ignored and mysterious behavior will result\!

Everything else stays the same and code and tests run like before.


h2. Postgres Installation and Basic Usage (Mac OS-X)

Please follow the steps described below and adjust as necessary if you already have a Postgres installation or are not installing on a Mac. This has been tested with MacOS 10.8 with latest brew and XCode. Installation for Linux is also possible but not described here.

Please make sure your brew and system environment are up to date. Please meticulously follow brew doctor's hints. Even if they sound not related, they mostly are correct and indicate problems with your installation.

{code}
brew update
brew doctor
{code}

h4. PostgreSQL and PostGIS Installation

{code}
brew install postgresql
brew info postgresql

brew install postgis
{code}

As of January 2014, the following versions should be installed:

{code}
$ brew info postgresql
postgresql: stable 9.3.2 (bottled)

$ brew info postgis
postgis: stable 2.1.1, HEAD
{code}

{warning}Please make sure you run a postgresql 9.3.x version. If you had a prior postgres installation, upgrade to 9.3.x and delete any prior databases.{warning}


For more PostGIS info and in case of trouble with the installation see also:
* [CIDev:Python and Virtualenv Mac Troubleshooting]
* [CIDev:M112 Geospatial Search]
* [CIDev:Postgres Discovery ElasticSearch PostGIS]
* Note: An alternative Mac installer is available here: [http://postgresapp.com/] (not recommended for developers)

h4. Create PostgreSQL databases and users

Create a default "postgres" database:

{code}
initdb /usr/local/var/postgres -E utf8
{code}

Start the database in a different shell window:

{code}
/usr/local/bin/postgres -D /usr/local/var/postgres
{code}

{tip}
Starting the database server can and should be automated by adding postgres to your supervisord.conf, if you are using this tool (recommended). See below for supervisord instructions
{tip}

Create a second user "ion" required by the OOINet:

{code}
createuser ion
{code}

Check that everything works by executing:

{code}
psql postgres
-- You are now in the psql shell. Exit with Ctrl-D or \q. Help with \?
\l
\du
-- Connect to postgres database as user ion
\c postgres ion
\q
{code}

h4. Update coi-services dependencies

New dependencies such as the postgres driver library are required and code has changed directories. Please run the following after pulling in the code that has the postgres updates:

{code}
workon coi
cd coi/services
ant clean
bin/buildout
bin/generate_interfaces
{code}

{tip}
This completes all steps necessary by initial users. bin/pycc and all unit and integration tests should work at this point similar to before.
{tip}

Try the postgres container by running:

{code}
bin/pycc -fc
{code}

The log output should indicate connection to the postgres server, creation of a database and then result in the normal container shell prompt with no other errors. Logging in to postgres should show some of the tables created:

{code}
psql postgres
\l
-- This should show a new database (your sysname)_ion. Let's connect to it
\c (your sysname)_ion
-- List the new tables
\dt
-- Show the resources table
SELECT id, type_, name from ion_resources;
\q
{code}


h4. Optional: Postgres admin UI tool (Mac)

Postgres comes with the psql command line client shown above. A powerful GUI admin tool can be downloaded here: [http://www.pgadmin.org/]




In order to connect to your localhost PostgreSQL instance, just add a new connection in the tool to localhost with your loginname as username and empty password (these are the brew install defaults)

{tip}
The PostgreSQL default installation is configured to only accept localhost connections. For configuration how to make the database server accessible via the network, please see below.
{tip}


h2. Postgres Installation and Basic Usage (CentOS 6)

Execute the following commands

{code}
sudo su -
# add EPEL yum repo
yum install http://mirror.pnl.gov/epel/6/i386/epel-release-6-8.noarch.rpm
# add PostgreSQL 9.3 yum repo
yum install http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm
# install packages
yum install postgresql93-server postgresql93-contrib postgresql93-plpython postgis2_93 proj
# add pg_trgm to shared libs
echo "shared_preload_libraries = 'pg_trgm'" >> /usr/lib/pgsql/9.3/data/postgresql.conf
# add postgres ident line, change other lines to md5; see below
vim /usr/lib/pgsql/9.3/data/pg_hba.conf
# start posgres
service postgresql-9.3 initdb
service postgresql-9.3 start
su - postgres
createuser -P ion
# set password to ion and make super user
{code}

Contents of pg_hba.conf:

{code}
local   all             postgres                                ident
local   all             all                                     md5
host    all             all             127.0.0.1/32            md5
host    all             all             ::1/128                 md5
{code}

h2. Advanced Configuration and Use

{tip}
The following advanced steps should not be required for OOI Network development.
{tip}

h3. Postgres Usage

h4. Server Use

Start Postgres server in foreground mode (Ctrl-C to exit):

{code}
/usr/local/bin/postgres -D /usr/local/var/postgres
{code}

Start/stop Postgres server as a deamon manually (on Mac):

{code}
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start

pg_ctl -D /usr/local/var/postgres stop -s -m fast

pg_ctl --help
{code}

Connect:

{code}
psql postgres

psql -h localhost -U ion postgres
{code}

{tip}
The PostgreSQL default installation is configured to only accept localhost connections. For configuration how to make the database server accessible via the network, please see below.
{tip}


h4. Use Supervisord to automatically start Postgres as a deamon

{tip}
See the [Container Use Guide|CIDev:R2 Container Use Guide#supervisord] for details about supervisord and how to configure it. This is VERY recommended
{tip}

Add the following snippet to your /etc/supervisord.conf (change settings as needed):

{code}
[program:postgresql]
command=/usr/local/bin/postgres -D /usr/local/var/postgres
;autostart=false
autorestart=false
stopsignal=INT
{code}

To update the running supervisord instance execute:

{code}
supervisorctl reread
supervisorctl update
{code}

Go to [http://localhost:9001/] to verify running processes and check for errors

h4. Enabling remote access to server (if needed)

* Add line "host all all 10.10.29.0/24 trust" to file /usr/local/var/postgres/pg_hba.conf (correct IP)
* Uncomment/set "listen_addresses='*'" in file /usr/local/var/postgres/postgresql.conf
* Restart server

See [http://www.cyberciti.biz/tips/postgres-allow-remote-access-tcp-connection.html]




h3. Common SQL Queries

For common queries (select statements) and other database operations, see [CIDev:Postgres SQL Snippets]




h3. Postgres Configuration

{tip}
The following is for advanced settings and configuration only and is not required for initial use.
{tip}

h4. Directories/Files on a Mac:

|| Directory/File || Description ||
| /usr/local/Cellar/postgresql/9.n.n | Binaries placed by brew install for a Postgres installation |
| /usr/local/var/postgres | Postgres database, can be generated using initdb tool |
| /usr/local/var/postgres/server.log | Postgres server logfile |
| /usr/local/var/postgres/pg_hba.conf | Remote access configuration file |
| /usr/local/var/postgres/postgresql.conf | Postgres configuration file. For tuning, see below |

h4. Configuration settings in postgresql.conf:

Changing the postgres config file is not required for initial development. For more a more performant database configuration, edit the file and enter these values:

|| Setting || Suggested Value || Description ||
| max_connections | 100 | Max number of open connections. For a developer system, this is not relevant |
| shared_buffers | 512MB | Memory for caching blocks. Should be set to 1/4 physical memory on a dedicated system and less on a developer machine. |
| work_mem | 5MB | Memory per join/sort |
| maintenance_work_mem | 64MB | Memory reserved for vacuum |
| synchronous_commit | off | Await a full fsync on disk writes. Can be off for a developer machine. The risk is minimal data loss never data integrity. Do not switch fsync=off which risks database integrity. |
| checkpoint_segments | 32 | |
| checkpoint_completion_target | 0.9 | |
| random_page_cost | 2.0 | Instructs query planner to use a lower lookup cost for blocks (e.g. with SSDs) |
| effective_cache_size | 1024MB | Hints the query planner to guess the memory for database cache (shared buffers) and OS disk cache. Up to 1/2 of physical memory |

An example postgresql.conf for a well equipped MacBookPro is attached here.


h2. OOI Network Integration

h3. Pyon Datastore implementation

* We use Python psycopg2 2.5 library
* Code is split into a postgresql/base_store.py with all Pyon/ION independent datastore access and postgresql/datastore.py with ION specific extensions. 
* postgresql/pg_util contains common utilities, such as a connection pool, tracing connections and cursors for query logging and a common connection function.
* Psycopg2's cursor.execute(statement, arg_list_or_dict) does encoding and escaping of value arguments provided in the statement with %s or %(name)s notation. It does NOT handle database, table or column names. That's why there is a mix of python string concatenation and Psycopg2 argument replacement going on. It is not possible to use the Python % format operator, because it would conflict with the Psycopg2 argument replacement notation
* We concatenate our own SQL statements instead of using SQLAlchemy. SQL is already a public interface language, here in the Postgres dialect. The use of the SQLAlchemy makes code even more complex and currently there is no need for RDBMS portability. All SQL is located within 2 modules and can be replaced when desired
* The sysname handling is is follows. 
** The Postgres database itself is prefixed with the sysname, e.g. sysname_ion. A default sysname stars with ion, leading to a database name of ion_hostname_ion
* The interface for application level code to use efficient datastore queries is to either use a specialized RR function (e.g. find_resources_ext, find_objects) or to use find_by_view and then process the view results. Postgres does not have views as the ones defined in Couch. Therefore, the postgres backend takes the parameters for reading the view, translates them into a SQL query and then assembles the view contents. This may be a bit tricky because of the different ways in which a view can be used (key query, multi key query, range query, sorted result expectation, pagination) and there are case decisions in the code but the actual mapping is straightforward
* The datastore uses a container global connection pool. The maximum number of concurrent connections can be configured.

See also:
* [README.txt|https://github.com/ooici/pyon/blob/postgres_merge/pyon/datastore/postgresql/README.txt]
* [CIDev:M166 PostgreSQL data store]

h3. Database Layout

* There is one Postgres database created for each sysname. The database name is prefixed with the sysname, e.g. sysname_ion. A default sysname starts with ion, leading to a database name of e.g. "ion_hostname_ion"
* The database requires an admin user (role) and an application user. The admin user, typically the user login name on developer machines or "ooiuser" on production machines, creates the database, tables and functions. The application user, typically "ion", performs all the CRUD operations with data and is limited to these operations.
* Database SQL scripts are located in the ion-definitions git repository (submodule) under "res/datastore/postgresql"
** Common database extensions and function definitions are located in the db_init.sql script and executed once on the creation of a new database
** Datastore profile specific scripts are called "profile_name.sql" where name is the name of a datastore profile (e.g. resources, objects, events, state, basic)
* Tables and indexes are prefixed with "ion_"
* Logical pyon datastores are represented as a set of tables and indexes with a common prefix and a behavior defining profile. 
** A profile initialization script is executed (defaulting to basic), creating different tables and indexes. E.g. the datastore "resources" with profile "resources" leads to the execution of the SQL script "profile_resources.sql" and creates tables "ion_resources", and others prefixed with ion_resources.
** The number of datastores is not limited
* Indexes are prefixed with the table name they apply to and are suffixed with "_idx"
* Referential integrity is used that resource associations and attachments cannot be present if resource is deleted. Cascade delete is defined in the schema.

h3. Services Integration

* The container creates and provides access to a set of datastores via higher level interfaces
** "resources" for the resource registry, via container.resource_registry
** "objects" for the default object store, via container.object_store
** "events" for the event repository, via container.event_repository
** "state" for the process state repository, via container.state_repository
** Services may create other datastores, as needed
* The datastore interface is the same for all datastore types, e.g. couchdb and postgresql. A common DatastoreFactory provides a means to request a datastore based on the configured default datastore.
* The container default datastore is set via the CFG property "container.datastore.default_server". An entry must exist under CFG "server.servername" for the given type of datastore.
* Running the with a PostgreSQL datastore does not require ElasticSearch as a search index. It should be configured as disabled in the CFG via "bootstrap.use_es=False"
* The discovery service is aware of the Postgres datastore and will perform searches via the appropriate postgres datastore instead of via ElasticSearch. It supports both search strategies and will operate based on system configuration. 

h2. Documentation

*PostgreSQL*

* [http://www.postgresql.org/docs/9.3/static/] \- Server and SQL documentation