The instance is still running, I tried to collect more information from it:
all databases are working as expected,
the only issue is that monitoring SQL commands (pg_stat_activity, pg_stat_replication) are not working as expected (do not reflect postgres processes list from command-line)
on Master:
- pg_stat_activity is empty as well (they can be seen just in ps f -fu postgres output: CTSYSTEM lines)
- psql as postgres: select * from pg_stat_activity sees only its own session
- psql as unprivileged user (CTSYSTEM): select * from pg_stat_activity is empty
- replication works fine (created a table, that was created also on all replicas)
- added lines to postgresql.conf + reload:
Opening new lines to postgresql.conf + reload configuration:
client_min_messages = debug5
log_min_messages = debug5
log_min_error_statement = debug5
- activity seen in pg_log, also replication activity (pgreplic user) is seen, still nothing in pg_stat_replication/pg_stat_activity
killed one slave postgres instance, restarted it
- "standby "l2abrnch" has now caught up with primary"
- replication works fine
- no entries on Master in pg_stat_replication
- ps -ef shows the new wal-sender process on master and wal-receiver process streaming on this slave
Version is:
PostgreSQL 9.3.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
I suspect something happened within master server (pg_stat_activity+pg_stat_replication not working as described, do not reflect ps -ef list of postgres processes and running SQL client/replication information)
What may be additionally useful information before restarting the master?
Regards, Andrej
2016-05-25 23:22 GMT+02:00 Andrej Vanek <andrej.vanek.sk@xxxxxxxxx>:
Streaming replication set-up,one master, 3 slaves connecting to it.I expected ps -ef gets all wal-sender processes and SAME information I'll get via select * from pg_stat_replication.Instead I observed:- pg_stat_replication is empty- 3 wal-sender processes up and running- each slave has wal-receiver process running- replication works (tried to create a table- it appears in all databases)Question:- why is pg_stat_replication empty?Andrej---------------details[root@l2bmain ~]# tail /opt/pg_data/postgresql.confmax_wal_senders = 5hot_standby = onwal_keep_segments = 128archive_command = '/opt/postgres/dbconf/archive_command.sh %p %f'wal_receiver_status_interval = 2max_standby_streaming_delay = -1max_standby_archive_delay = -1restart_after_crash = offhot_standby_feedback = onwal_sender_timeout = 1min[root@l2bmain ~]# ps f -fu postgresUID PID PPID C STIME TTY STAT TIME CMDpostgres 10797 1 0 15:53 ? S 0:20 /usr/pgsql-9.3/bin/postgres -D /opt/pg_data -c config_file=/opt/pg_data//postgresql.confpostgres 10820 10797 0 15:53 ? Ss 0:00 \_ postgres: logger processpostgres 10823 10797 0 15:53 ? Ss 0:00 \_ postgres: checkpointer processpostgres 10824 10797 0 15:53 ? Ss 0:00 \_ postgres: writer processpostgres 10825 10797 0 15:53 ? Ss 0:00 \_ postgres: wal writer processpostgres 10826 10797 0 15:53 ? Ss 0:01 \_ postgres: autovacuum launcher processpostgres 10827 10797 0 15:53 ? Ss 0:00 \_ postgres: archiver process last was 0000000100000000000000A3.00000028.backuppostgres 10828 10797 0 15:53 ? Ss 0:03 \_ postgres: stats collector processpostgres 11286 10797 0 15:54 ? Ss 0:08 \_ postgres: wal sender process pgreplic 192.168.204.12(55231) streaming 0/A401BED8postgres 11287 10797 0 15:54 ? Ss 0:06 \_ postgres: wal sender process pgreplic 192.168.204.11(42937) streaming 0/A401BED8postgres 19322 10797 0 15:58 ? Ss 0:08 \_ postgres: wal sender process pgreplic 192.168.101.11(52379) streaming 0/A401BED8postgres 28704 10797 0 18:44 ? Ss 0:00 \_ postgres: CTSYSTEM lidb 192.168.102.13(58245) idlepostgres 7256 10797 0 18:52 ? Ss 0:00 \_ postgres: CTSYSTEM lidb 192.168.102.23(55190) idlepostgres 8667 10797 0 18:53 ? Ss 0:00 \_ postgres: CTSYSTEM lidb 192.168.102.13(58287) idle[root@l2bmain ~]# psql -U postgres -c "select * from pg_stat_replication;"pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-------+---------------+----------------+----------------+-----------------+---------------+------------(0 rows)[root@l2bmain ~]# tail /opt/pg_data/pg_log/postgresql-Wed.log2016-05-25 15:53:56 CEST:@:[8603] LOG: database system is shut down2016-05-25 15:53:58 CEST:@:[10821] LOG: database system was shut down in recovery at 2016-05-25 15:53:56 CEST2016-05-25 15:53:58 CEST:@:[10821] LOG: database system was not properly shut down; automatic recovery in progress2016-05-25 15:53:58 CEST:@:[10821] LOG: consistent recovery state reached at 0/A20000902016-05-25 15:53:58 CEST:@:[10821] LOG: record with zero length at 0/A20000902016-05-25 15:53:58 CEST:@:[10821] LOG: redo is not required2016-05-25 15:53:58 CEST:@:[10821] LOG: MultiXact member wraparound protections are now enabled2016-05-25 15:53:58 CEST:@:[10797] LOG: database system is ready to accept connections2016-05-25 15:53:58 CEST:@:[10826] LOG: autovacuum launcher started`[root@l2bmain ~]# ssh 192.168.101.11Last login: Wed May 25 22:48:18 2016 from 192.168.101.12[root@l2amain ~]# ps f -fu postgresUID PID PPID C STIME TTY STAT TIME CMDpostgres 5730 1 0 15:58 ? S 0:04 /usr/pgsql-9.3/bin/postgres -D /opt/pg_data -c config_file=/opt/pg_data//postgresql.confpostgres 5754 5730 0 15:58 ? Ss 0:00 \_ postgres: logger processpostgres 5755 5730 0 15:58 ? Ss 0:00 \_ postgres: startup process recovering 0000000100000000000000A4postgres 5773 5730 0 15:58 ? Ss 0:12 \_ postgres: wal receiver process streaming 0/A401C030postgres 5774 5730 0 15:58 ? Ss 0:00 \_ postgres: checkpointer processpostgres 5775 5730 0 15:58 ? Ss 0:00 \_ postgres: writer processpostgres 5776 5730 0 15:58 ? Ss 0:00 \_ postgres: stats collector process[root@l2amain ~]# psql -U postgres -c "select pg_is_in_recovery();"pg_is_in_recovery-------------------t(1 row)[root@l2bmain ~]# ssh 192.168.204.11Warning: Permanently added '192.168.204.11' (RSA) to the list of known hosts.Last login: Wed May 25 16:28:49 2016 from 192.168.200.254[root@l2abrnch ~]# ps f -fu postgresUID PID PPID C STIME TTY STAT TIME CMDpostgres 8174 1 0 15:48 ? S 0:04 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.confpostgres 8195 8174 0 15:48 ? Ss 0:00 \_ postgres: logger processpostgres 8197 8174 0 15:48 ? Ss 0:00 \_ postgres: startup process recovering 0000000100000000000000A4postgres 8206 8174 0 15:48 ? Ss 0:00 \_ postgres: checkpointer processpostgres 8207 8174 0 15:48 ? Ss 0:00 \_ postgres: writer processpostgres 8208 8174 0 15:48 ? Ss 0:00 \_ postgres: stats collector processpostgres 11414 8174 0 15:53 ? Ss 0:13 \_ postgres: wal receiver process streaming 0/A401C0D0[root@l2bmain ~]# ssh 192.168.204.11Warning: Permanently added '192.168.204.11' (RSA) to the list of known hosts.Last login: Wed May 25 16:28:49 2016 from 192.168.200.254[root@l2abrnch ~]# ps f -fu postgresUID PID PPID C STIME TTY STAT TIME CMDpostgres 8174 1 0 15:48 ? S 0:04 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.confpostgres 8195 8174 0 15:48 ? Ss 0:00 \_ postgres: logger processpostgres 8197 8174 0 15:48 ? Ss 0:00 \_ postgres: startup process recovering 0000000100000000000000A4postgres 8206 8174 0 15:48 ? Ss 0:00 \_ postgres: checkpointer processpostgres 8207 8174 0 15:48 ? Ss 0:00 \_ postgres: writer processpostgres 8208 8174 0 15:48 ? Ss 0:00 \_ postgres: stats collector processpostgres 11414 8174 0 15:53 ? Ss 0:13 \_ postgres: wal receiver process streaming 0/A401C0D0[root@l2abrnch ~]# psql -U postgres -c "select pg_is_in_recovery();"pg_is_in_recovery-------------------t(1 row)[root@l2abrnch ~]# logoutConnection to 192.168.204.11 closed.[root@l2bmain ~]# ssh 192.168.204.12Warning: Permanently added '192.168.204.12' (RSA) to the list of known hosts.Last login: Wed May 25 14:58:03 2016 from 192.168.200.254[root@l2bbrnch ~]# ps f -fu postgresUID PID PPID C STIME TTY STAT TIME CMDpostgres 22885 1 0 15:48 ? S 0:00 /usr/pgsql-9.3/bin/postgres -D /opt/geo_stdby_data -c config_file=/opt/geo_stdby_data//postgresql.confpostgres 22913 22885 0 15:48 ? Ss 0:00 \_ postgres: logger processpostgres 22914 22885 0 15:48 ? Ss 0:00 \_ postgres: startup process recovering 0000000100000000000000A4postgres 22917 22885 0 15:48 ? Ss 0:00 \_ postgres: checkpointer processpostgres 22918 22885 0 15:48 ? Ss 0:00 \_ postgres: writer processpostgres 22919 22885 0 15:48 ? Ss 0:00 \_ postgres: stats collector processpostgres 26163 22885 0 15:54 ? Ss 0:13 \_ postgres: wal receiver process streaming 0/A401C0D0