I am trying to create a highly available PostgreSQL database set-up using the Compose Governor project [https://github.com/compose/governor].
I have three Postgres servers and I want to ensure that full service remains when two out of three database servers remain online. Having these systems immediately in synchronisation with each other is of utmost importance to the application they will serve.
So from what I understand, I'm aiming to do Synchronous Streaming Replication, where writes to two out of three of the servers is considered safe.
When the servers come online and when I try and and create a database for my application, my process hangs and I can't yet understand why or what I do to fix this.
Systemd reports the following:
# systemctl status governor
● governor.service - Governor: A runner to orchestrate a high-availability PostgreSQL service
Loaded: loaded (/etc/systemd/system/governor.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-10-31 10:37:42 GMT; 6min ago
Main PID: 11685 (python)
CGroup: /system.slice/governor.service
├─11685 /bin/python /opt/governor/governor.py /etc/governor/config.yml
├─11705 /usr/pgsql-9.6/bin/postgres -D /var/lib/pgsql/governor/pgsqla -c listen_addresses=192.168.2.111 -c port=5432 -c hot_standby=on -c synchronous_commit=on -c...
├─11706 postgres: logger process
├─11708 postgres: checkpointer process
├─11709 postgres: writer process
├─11710 postgres: stats collector process
├─11752 postgres: postgres postgres [local] idle
├─11754 postgres: wal writer process
├─11755 postgres: autovacuum launcher process
├─11756 postgres: archiver process last was 000000020000000000000005
├─11809 postgres: wal sender process replicator 192.168.2.112(32852) streaming 0/6000968
├─12025 postgres: postgres postgres [local] CREATE DATABASE waiting for 0/6000850
└─12050 postgres: wal sender process replicator 192.168.2.113(48000) streaming 0/6000968
The full command string of the postgres process is:# ps -fwwp 11705
UID PID PPID C STIME TTY TIME CMD
postgres 11705 1 0 10:37 ? 00:00:00 /usr/pgsql-9.6/bin/postgres -D /var/lib/pgsql/governor/pgsqla -c listen_addresses=192.168.2.111 -c port=5432 -c hot_standby=on -c synchronous_commit=on -c wal_keep_segments=8 -c wal_level=hot_standby -c archive_command=mkdir -p ../wal_archive && cp %p ../wal_archive/%f -c max_wal_senders=5 -c archive_timeout=1800s -c archive_mode=on -c synchronous_standby_names=2 ( pgsqla, pgsqlb, pgsqlc ) -c max_replication_slots=5
Postgres reports the following:
postgres=# select * from pg_stat_activity; datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query -------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+------------------------------- +-------------------------------+-------------------------------+-----------------+------------+--------+-------------+--------------+------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- 13322 | postgres | 11752 | 10 | postgres | | | | -1 | 2016-10-31 10:37:43.863694+00 | | 2016-10-31 10:40:54.273678+00 | 2016-10-31 10:40:54.273893+00 | | | idle | | | DO LANGUAGE plpgsql $$DECLARE somevar VARC HAR; BEGIN SELECT slot_name INTO somevar FROM pg_replication_slots WHERE slot_name = 'pgsqlc' LIMIT 1; IF NOT FOUND THEN PERFORM pg_create_physical_replication_slot('pgsqlc'); E ND IF; END$$; 13322 | postgres | 12025 | 10 | postgres | | | | -1 | 2016-10-31 10:38:26.715132+00 | 2016-10-31 10:38:26.717104+00 | 2016-10-31 10:38:26.717104+00 | 2016-10-31 10:38:26.717105+00 | | | active | 1756 | | CREATE DATABASE "database" 13322 | postgres | 12184 | 10 | postgres | psql | | | -1 | 2016-10-31 10:40:34.102113+00 | 2016-10-31 10:40:57.687182+00 | 2016-10-31 10:40:57.687182+00 | 2016-10-31 10:40:57.687197+00 | | | active | | 1756 | select * from pg_stat_activity; (3 rows) postgres=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+--------------------- pgsqlb | | physical | | | t | 11809 | | | 0/6000888 | pgsqlc | | physical | | | t | 12050 | | | 0/6000888 | (2 rows)
What can be done to unfreeze this connection and correctly create my applications database correctly?
Many Thanks,
Adam.