Re: robust archiving of WAL segments

Murthy Nunna <mnunna@xxxxxxxx> · Tue, 8 Apr 2014 18:27:41 +0000

I am not sure if you already implemented but not stated, you might want to consider an alert in case of some out of the ordinary gap between standby and primary which you may have to address manually when it happens. In our case a gap of 10 minutes is unacceptable we ingest data constantly in primary.

No matter how much we automate and make it fault tolerant we still need to have closed loop mechanism. Rsync can fail anytime and you don't want it go unnoticed.

My 2c.

-----Original Message-----
From: pgsql-admin-owner@xxxxxxxxxxxxxx [mailto:pgsql-admin-owner@xxxxxxxxxxxxxx] On Behalf Of Andrew W. Gibbs
Sent: Tuesday, April 08, 2014 8:56 AM
To: pgsql-admin@xxxxxxxxxxxxxx
Subject:  robust archiving of WAL segments

I have a Postgres 8.4.X database that is subject to a fairly high ingest rate.  It is not uncommon for the system to generate 1-3 WAL segments of 16M in size every minute.  I endeavor to keep these WAL segments around both for maintaining standby servers as well as for disaster recovery.

I have over time ended up with a server architecture that looks like
this:

  * master database has an archive_command that copies WAL segments
    out of pg_xlog and puts them in another local directory that we'll
    call wal_segment_staging_area

  * there is a daemon that shovels files out of
    wal_segment_staging_area using rsync and places them on a separate
    file server in a directory we'll call wal_segment_archive

  * there are standby servers in continuous recovery mode whose
    restore_command sources WAL segments from the file server's
    wal_segment_archive directory

The major considerations that governed this design are:

  * the high rate of inflow generating a large number of segments quickly

  * the desire not to have the WAL segments hosted on the master
    database server long-term due to performance and fault-tolerance
    reasons

  * the fear of having an archive_command require network connectivity
    to another server, the connectivity getting severed, and pg_xlog
    getting filled and wedging the master server

  * the problem that if you get substantially behind due to a
    transient networking problem you want to be able to catch up as
    quickly as possible, which means among other things that you don't
    want to be opening up a separate remote connection to transfer
    every segment in a loop, hence rsync

Over the past couple of years that this arrangement has been in operation I have had on a handful of occasions occur what I believe to be the following series of events:

  * transfer operation of a WAL segment occurs, either from pg_xlog to
    wal_segment_staging_area or from wal_segment_staging_area to
    wal_segment_archive

  * the process transferring the file believes that the operation has
    succeeded and deletes its local copy of the file

  * the server that received the file experiences a transient fault
    that causes it to wedge and an administrator does a hard reset out
    of desperation

  * when the receiving server comes back up, the file that was created
    during the archive operation is there, but it is zero-length,
    presumably indicating that from the standpoint of an application
    the write operation had succeeded, but from the standpoint of
    caching the file's contents were not durably sync'd to disk

  * when this has occurred to the file being placed in
    wal_segment_staging_area, a subsequent power-on of the master
    database has yielded repeated error messages in the log files
    saying that the archive_command had failed (because, following the
    Postgres docs' advice, the archive_command does a test to see if
    the file already exists), and eventually Postgres says "I give up
    trying to archive this WAL segment!" and deletes it from pg_xlog
    without it being archived, which blew my mind the first time I saw
    it happen because I thought the archive_command was supposed to be
    retried indefinitely until success was reported

  * when this has occurred to the file being placed in
    wal_segment_archive, the effect is similar, although you don't get
    any error messages in Postgres, just an archive that contains some
    zero-length WAL segments

  * in both cases, you end up with a WAL segment stream that is
    corrupted, such that until you have made a new base backup if you
    were to have to perform a point-in-time restore operation you
    would only be able to restore up until the point of the event
    because a subset of the WAL is lost forever

Thoughts and questions:

  * it seems that robust archiving of WAL segments requires that your
    script be as pedantic about transferring the files as Postgres is
    about creating them; Postgres doesn't report a transaction as
    having closed until the relevant WAL data has been durably written
    to disk, and your archiving scripts cannot afford to delete the
    source file until the destination file has likewise been durably
    written

  * is it a bug that Postgres gives up trying to archive a WAL segment
    after several failed tries?  or have I somehow misconfigured
    something?  maybe after the fashion of implementing a
    restore_command for continuous recovery mode, i.e. implementing a
    restore_command that does not return until a file has been
    restored, I ought likewise implement an archive_command that does
    not return until the file has been successfully archived?

  * the Postgres docs chapter "Continuous Archiving and Point-in-Time
    Recovery" provides a lot of good information about rigging up an
    environment for dealing with all this stuff, but it does not to my
    knowledge give any treatment to such matters as I describe herein,
    and that may have yielded a somewhat precarious situation for a
    lot of installations; there probably ought be a treatment of
    ensuring that transmitted files have been written durably

  * I'm not even sure what a good solution is for guaranteeing that
    transmitted files have been durably persisted using common tools;
    it doesn't seem that commonly available rsync implementations
    support a "please call fsync" option, though some Googling yields
    discussion of a patched version that someone created for such
    purposes; maybe I could invoke the shell command "sync" as part of
    the dance, but that doesn't seem that great either, since the
    first transfer is happening on the master database and I don't
    want to issue a sync request to all file systems as that will kill
    database performance, and the second transfer is happening via
    rsync and you wouldn't be able to call "sync" until "rsync" had
    already deleted the source files, thus creating a race condition

  * I can imagine what full solutions would look like, but not ones
    that don't involve a fair amount of custom code to solve what
    feels like it ought be a solved problem

How have you folks dealt with needing this level of robustness?

  -- AWG

--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin