I have a 2 node active/standby setup, with synchronous streaming enabled.
WAL segments are replicated as expected on the standby.
However, if I manually kill the postgres process with pkill on the primary I'm ending up with a standby WAL behind that of the primary. The primary seems to be incrementing its WAL automatically after I restart it.
Example:
Before killing the pg process:
Primary & standby seem to be synchronizing synchronously:
pg_last_wal_receive_lsn
-------------------------
0/5E0108B0
(1 row)
pg_current_wal_lsn
--------------------
0/5E0108B0
(1 row)
Now I do "pkill postgres", on the primary the WAL dir has
00000001000000000000005F as the lastest segment (file) (supposed to be 5E, but unexpectedly getting incremented)
while the standby has:
00000001000000000000005E as its lastest segment
The problem is if I want to restart the primary as a standby (swapping the roles), it will complain about asking for a WAL too far in the future that is not available on the new primary (old standby):
could not receive data from WAL stream: ERROR: requested starting point 0/5F000000 is ahead of the WAL flush position of this server 0/5E0CBF38
requested starting point 0/5F000000 is ahead of the WAL flush position of this server 0/5E0D3200
Isn't the primary (original primary) expected to know how far is its standby?
Doing a base backup recovery is not an option for me at this point.
This is "pg_ctl (PostgreSQL) 10.2"