pg_rewind rewinded too much...

marin@xxxxxxxx · Tue, 29 Nov 2016 09:30:26 +0100

Hi,

I am doing some testing with pg_rewind and I encountered a very strange 
situation. So, let me describe the scenario:

1. Create a master (server1) node (max_wal_size = 2GB, 
checkpoint_timeout = 10min)
2. Create a test database on it (server1)
3. Create a test table (server1)
4. Insert a couple of million rows ( >> 5GB) but cancel the query at 
some point leaving a few gigs of useless WAL files (server1)
5. Insert a couple of thousand rows ( < 1GB) (server1)
6. Build a slave (server2) and connect to the master (server1)
7. Promote the slave (server2 now master)
8. Insert one row on the old master (server1)
9. Shutdown the old master (server1)
10. Issue a pg_rewind on the old master (server1)
11. Configure recovery.conf and start the new slave (server1)

All of this under 10 minutes :)

The timeline switch is at:
1       1/B2A64F80      no recovery target specified

The pg_controldata on the slave (server1) gives:
Latest checkpoint location:           1/B3000108
Prior checkpoint location:            1/B3000060
Latest checkpoint's REDO location:    1/B30000D0
Latest checkpoint's REDO WAL file:    0000000200000001000000B3

But when starting the slave i get the following errors in the log file:

2016-11-28 15:38:44 UTC: WAL 00000002.history successfully restored
2016-11-28 15:38:43 UTC [6]: [14-1] user=,db= LOG:  restored log file 
"00000002.history" from archive
2016-11-28 15:38:44 UTC: Error copying WAL 00000002000000010000009D!
2016-11-28 15:38:44 UTC: Error copying WAL 00000001000000010000009D!
2016-11-28 15:38:44 UTC [494]: [1-1] user=,db= LOG:  started streaming 
WAL from primary at 1/9D000000 on timeline 2
2016-11-28 15:38:44 UTC [494]: [2-1] user=,db= FATAL:  could not receive 
data from WAL stream: ERROR:  requested WAL segment 
00000002000000010000009D has already been removed

So the problem is that the slave (server1) went back in time, to a 
location that is not on the new master (first WAL available on the new 
master is 0000000100000001000000AE). The old archive on the slave 
(server1) is gone (deleted) so there is no way to fix this. And this is 
not a problem per se, because you can use pg_basebackup or rsync instead 
of pg_rewind. The real problem is that I can't diagnose this problem 
until I start the server process and read the logs. Shouldn't 
pg_controldata report the first WAL needed to start the instance?

Regards,
Mladen Marinović

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general