I have master and slave running with the following contents of their pg_wal directories and archivedir:
ls -l /mnt/pgsql/archive/
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000001
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000002
-rw-rw-rw-. 1 root root 302 Feb 15 09:39 000000010000000000000002.00000028.backup
pg-hdp-node1.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39 000000010000000000000002
-rw-------. 1 postgres postgres 302 Feb 15 09:39 000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44 000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39 000000010000000000000004
drwx------. 2 postgres postgres 96 Feb 15 09:44 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
-rw-------. 1 postgres postgres 0 Feb 15 09:39 000000010000000000000002.00000028.backup.done
-rw-------. 1 postgres postgres 0 Feb 15 09:39 000000010000000000000002.done
pg-hdp-node2.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres root 16777216 Feb 15 09:39 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44 000000010000000000000003
drwx------. 2 postgres root 6 Feb 15 09:39 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
diff from secondary pg-hdp-node2.kitchen.local on /var/lib/pgsql/10/data/pg_wal/000000010000000000000002 and /mnt/pgsql/archive/000000010000000000000002 shows binary differences but as expected no differences for diff on primary pg-hdp-node1.kitchen.local
Failover is performed and pg-hdp-node2.kitchen.local tries and fails to archive WAL segment 000000010000000000000002 because it has been previously archived
2019-02-15 09:54:50.518 PST [780] DETAIL: The failed archive command was: test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002
Based on this thread https://www.postgresql.org/message-id/11b405a6-2176-9510-bf5b-ea9c0e860635%40pgmasters.net it is suggested to handle this case by reporting success but in my case contents are different. I would think that previously archived 000000010000000000000002 is the right WAL segment.
So my questions are as follows:
Why WAL segments differ?
How should this be resolved on the new primary?
-- Andre Piwoni