Re: Strange replication problem - segment restored from archive but still requested from master

Guillaume Lelarge <guillaume@xxxxxxxxxxxx> · Mon, 25 May 2015 17:35:20 +0200

2015-05-25 15:15 GMT+02:00 Piotr Gasidło <quaker@xxxxxxxxxxxxxx>:
2015-05-25 11:30 GMT+02:00 Guillaume Lelarge <guillaume@xxxxxxxxxxxx>:

>> I currently have wal_keep_segments set to 0.

>> Setting this to higher value will help? As I understand: master won't

>> delete segment and could stream it to slave on request - so it will

>> help.

>

>

> It definitely helps, but the issue could still happen.

>

What conditions must be met for issue to happen?

Very high WAL traffic can make the slave lag enough that even wal_keep_segments isn't enough. 

Both archive_command on master and restore_commands are set and working.

Also wal_keep_segments is set.

I see no point of failure - only delay in the case of high WAL traffic

on master:

- slave starts with restoring WALs from archive,

- now, it connects to master and notices, that for last master's WAL

it needs previous one ("the issue"),

- slave asks master for previous WAL and gets it - job done, streaming

replication set, exit

- if unable to get it (WAL traffic is high, and after restoring last

WAL from archive and asking master for next one more than

wal_keep_segments were recycled) it returns to looking WALs in

archive.

Do I get it right?

Yes. If you set correctly archive_command (on the master) and restore_command (on the slave), there's no point of failure. You might still get the "WAL not available" error message, but the slave can synchronize itself with the archived WALs.

-- 
Guillaume.
  http://blog.guillaume.lelarge.info
  http://www.dalibo.com