Hello, Can someone please help me with the below query? Regards, Mariya From: Mariya Rampurawala <Mariya.Rampurawala@xxxxxxxxxxx> Hi, I am working on providing HA for replication, using automation scripts.
My set up consists on two nodes, Master and Slave. When master fails, The slave is promoted to master. But when I try to re-register the old master as slave, the pg_rewind command fails. Details
below. Non-default entries in Postgresql.conf =================================== archive_mode = on archive_command =
'scp %p root@10.209.56.88:/pg_backup/%f' archive_cleanup_command =
'/bin/ssh root@10.209.56.88 "/usr/pgsql-12/bin/pg_archivecleanup
/pg_backup %r"' synchronous_standby_names = '*' listen_addresses = '*' wal_level = hot_standby max_wal_senders = 32 max_replication_slots = 8 hot_standby = on wal_log_hints = on synchronous_commit = on hot_standby_feedback = on restart_after_crash = off max_connections = 100 =================================== Postgresql.auto.conf at standby: =================================== primary_conninfo =
'user=replicator host=10.209.57.16 port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any' restore_command =
'scp root@10.209.56.88:/pg_backup/%f %p' recovery_target_timeline='latest' =================================== Reproduction steps:
-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/
start waiting for server to start....2020-05-11 05:26:01.665 IST [23999] LOG:
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:26:01.665 IST [23999] LOG:
listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:26:01.665 IST [23999] LOG:
listening on IPv6 address "::", port 5432 2020-05-11 05:26:01.666 IST [23999] LOG:
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:26:01.668 IST [23999] LOG:
listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:26:01.688 IST [23999] LOG:
redirecting log output to logging collector process 2020-05-11 05:26:01.688 IST [23999] HINT:
Future log output will appear in directory "log". done server started
/usr/pgsql-12/bin/pg_basebackup -D /pg_mnt/pg-12/data -X stream -R -c fast -U replicator --host=10.209.57.16 --port=5432
-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data start waiting for server to start....2020-05-11 05:29:09.422 IST [22624] LOG:
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:29:09.422 IST [22624] LOG:
listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:29:09.422 IST [22624] LOG:
listening on IPv6 address "::", port 5432 2020-05-11 05:29:09.423 IST [22624] LOG:
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:29:09.425 IST [22624] LOG:
listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:29:09.442 IST [22624] LOG:
redirecting log output to logging collector process 2020-05-11 05:29:09.442 IST [22624] HINT:
Future log output will appear in directory "log". .. done server started
-bash-4.2$ psql psql (12.2) Type "help" for help. postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+--------------------------------- pid
| 24988 usesysid
| 16385 usename
| replicator application_name | walreceiver client_addr
| 10.209.57.14 client_hostname
| client_port
| 40802 backend_start
| 2020-05-11 05:29:14.752341+05:30 backend_xmin
| 588 state
| streaming sent_lsn
| 5/A60000D8 write_lsn
| 5/A60000D8 flush_lsn
| 5/A60000D8 replay_lsn
| 5/A60000D8 write_lag
| flush_lag
| replay_lag
| sync_priority
| 1 sync_state
| sync reply_time
| 2020-05-11 05:29:54.895857+05:30
#psql -h 10.209.57.16
--user=postgres psql (12.2) Type "help" for help. postgres=# \c vcs You are now connected to database "vcs" as user "postgres". vcs=# create table c1 as select generate_series(1,10000000000);
-bash-4.2$ ps -ef | grep post root 1860 18990 0 May08 pts/1 00:00:00 vim
postgresql.cluster.conf root 1925 14243 0 Apr07 pts/0 00:00:00 su -
postgres postgres 1926 1925
0 Apr07 pts/0 00:00:00 -bash root 7321 1 0 Mar14 ? 00:00:15 /usr/libexec/postfix/master
-w postfix 7350 7321
0 Mar14 ? 00:00:03 qmgr -l -t unix -u postfix 10410 7321
0 04:47 ? 00:00:00 pickup -l -t unix -u postgres 23999
1 0 05:26 ? 00:00:00 /usr/pgsql-12/bin/postgres
-D /pg_mnt/pg-12/data postgres 24000 23999
0 05:26 ? 00:00:00 postgres: logger postgres 24002 23999
0 05:26 ? 00:00:00 postgres: checkpointer postgres 24003 23999
0 05:26 ? 00:00:00 postgres: background writer postgres 24004 23999
0 05:26 ? 00:00:01 postgres: walwriter postgres 24005 23999
0 05:26 ? 00:00:00 postgres: autovacuum launcher
postgres 24006 23999
0 05:26 ? 00:00:00 postgres: archiver archiving
0000003500000005000000C7 postgres 24007 23999
0 05:26 ? 00:00:00 postgres: stats collector postgres 24008 23999
0 05:26 ? 00:00:00 postgres: logical replication launcher
postgres 24988 23999
1 05:29 ? 00:00:03 postgres: walsender replicator
10.209.57.14(40802) streaming 6/4A480000 postgres 25906 23999
76 05:31 ? 00:00:47 postgres:
postgres vcs 10.209.57.15(58642) CREATE TABLE AS postgres 26240 24006
1 05:32 ? 00:00:00 scp pg_wal/0000003500000005000000C7 root@10.209.56.88:/pg_backup/0000003500000005000000C7 postgres 26241 26240
0 05:32 ? 00:00:00 /usr/bin/ssh -x -oForwardAgent=no -oPermitLocalCommand=no -oClearAllForwardings=yes -l root -- 10.209.56.88 scp -t /pg_backup/0000003500000005000000C7 postgres 26242 32277
0 05:32 pts/0 00:00:00 ps -ef postgres 26243 32277
0 05:32 pts/0 00:00:00 grep --color=auto post root 32276 24501 0 May10 pts/0 00:00:00 su -
postgres postgres 32277 32276
0 May10 pts/0 00:00:00 -bash -bash-4.2$ kill -9 23999 -bash-4.2$ ps -ef | grep post root 1860 18990 0 May08 pts/1 00:00:00 vim
postgresql.cluster.conf root 1925 14243 0 Apr07 pts/0 00:00:00 su -
postgres postgres 1926 1925
0 Apr07 pts/0 00:00:00 -bash root 7321 1 0 Mar14 ? 00:00:15 /usr/libexec/postfix/master
-w postfix 7350 7321
0 Mar14 ? 00:00:03 qmgr -l -t unix -u postfix 10410 7321
0 04:47 ? 00:00:00 pickup -l -t unix -u postgres 24000
1 0 05:26 ? 00:00:00 postgres: logger postgres 25906
1 81 05:31 ? 00:01:30 postgres:
postgres vcs 10.209.57.15(58642) CREATE TABLE AS postgres 26514 32277
0 05:33 pts/0 00:00:00 ps -ef postgres 26515 32277
0 05:33 pts/0 00:00:00 grep --color=auto post root 32276 24501 0 May10 pts/0 00:00:00 su -
postgres postgres 32277 32276
0 May10 pts/0 00:00:00 -bash
-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data promote waiting for server to promote..... done server promoted -bash-4.2$ psql psql (12.2) Type "help" for help. postgres=# checkpoint; CHECKPOINT postgres=#
-bash-4.2$ /usr/pgsql-12/bin/pg_rewind -D /pg_mnt/pg-12/data --source-server="host=10.209.57.17
port=5432 user=postgres dbname=postgres" pg_rewind: fatal: target server must be shut down cleanly
-bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/ -m smart start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2020-05-11 05:36:29.005 IST [27684] LOG:
starting PostgreSQL 12.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit 2020-05-11 05:36:29.006 IST [27684] LOG:
listening on IPv4 address "0.0.0.0", port 5432 2020-05-11 05:36:29.006 IST [27684] LOG:
listening on IPv6 address "::", port 5432 2020-05-11 05:36:29.007 IST [27684] LOG:
listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2020-05-11 05:36:29.008 IST [27684] LOG:
listening on Unix socket "/tmp/.s.PGSQL.5432" 2020-05-11 05:36:29.027 IST [27684] LOG:
redirecting log output to logging collector process 2020-05-11 05:36:29.027 IST [27684] HINT:
Future log output will appear in directory "log". ........................................................... stopped waiting pg_ctl: server did not start in time -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/ status pg_ctl: server is running (PID: 27684) /usr/pgsql-12/bin/postgres "-D" "/pg_mnt/pg-12/data" -bash-4.2$ /usr/pgsql-12/bin/pg_ctl -D /pg_mnt/pg-12/data/ -m smart stop waiting for server to shut down..... done server stopped
pg_rewind: servers diverged at WAL location 6/B9FFFFD8 on timeline 53 pg_rewind: error: could not open file "/pg_mnt/pg-12/data/pg_wal/0000003500000006000000B9": No such file or directory pg_rewind: fatal: could not find previous WAL record at 6/B9FFFFD8 I have tried this multiple times but always face the same error. Can someone help me resolve this? Thanks and Regards, Mariya |