Re: Warm standby: 1 to N

Yar Tikhiy <yar@xxxxxxxxxxxxx> · Wed, 8 Jul 2009 15:17:44 +1000

On Tue, Jun 02, 2009 at 02:52:26PM -0400, Bruce Momjian wrote:
Yaroslav Tykhiy wrote:
Hi All,

Let's consider the following case: WAL segments from a master have
been shipped to N warm standby servers, and now the master fails.
Using this or that mechanism, one of the warm standbys takes over and
becomes the new master.  Now the question is what to do with the  
other
N-1 warm standbys.  By the failure, all N warm standbys were the same
exact copies of the master.  So at least in theory, the N-1 warm
standbys left can be fed with WAL segments from the new master.  Do
you think it will work in practice?  Are there any pitfalls?

I think it should work.

Bruce, thank you a lot for the encouragement!  I had a chance to go a  
step further and fail over to a warm stand-by server without losing a  
singe transaction.  Now I'm happy to share my experience with the  
community.

The initial setup was as follows: Server A was the master, servers B  
and C were warm stand-bys.  The task was to fail over from A to B in a  
controlled manner whilst keeping C running as a warm stand-by.

Both B and C were initially running with archive_command set as follows:

archive_command='/some/path/archive.sh "%p" "%f"'

where archive.sh contained just "exit 1".  So a real archive script  
could be atomically mv'ed in place later without losing any WAL  
segments.  (Note that the archiver process is supposed to queue  
segments and keep retrying as long as the archive command is exiting  
with a non-zero status.)

After making sure B and C were keeping up with A, the latter was shut  
down.  Then the last, incomplete WAL segment NNN was manually copied  
from A (pg_controldata was useful to find its name) to B's WAL  
shipping spool for the restore script to pick it up.

B processed segment NNN and, upon reaching its logical end, exited  
recovery mode.  At this moment all the clients were switched over to  
B.  Now the master, B continued writing its transaction log to segment  
NNN, filling it up and moving on to the next segment NNN+1.

(On the one hand, it was quite unexpected that B didn't move on to a  
new timeline upon exiting recovery mode.  On the other hand, had it  
done so, the whole trick would have been impossible.  Please correct  
me if I'm wrong.  Just in case, the Postgresql version was 8.0.6.   
Yes, it's ancient and sorely needs an upgrade.)

Now segment NNN was full and contained both the last transactions from  
A and the first transactions from B.  It was time to ship NNN from B  
to C in order to bring C in line with B -- without disrupting C's  
recovery mode.  A real archive script was substituted for the dummy  
script on B.  At the next retry the script shipped segment NNN to C  
and so the WAL shipping train got going B->C.

A possible pitfall to watch out for is this: If the WAL shipping spool  
is shared between B and C, e.g., NFS based, just copying segment NNN  
to it will make both B and C exit recovery mode.  To avoid that, at  
least in theory, segment NNN can be copied directly into B's pg_xlog  
and then B's restore command needs to be signalled to return a non- 
zero status.  According to the manual, the recovery process is  
supposed to look in pg_xlog as a final resort in case the restore  
command returned an error status.  However, I didn't try that as I had  
separate, local WAL spools on B and C.

Hoping all this stuff helps somebody...

Yar

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general