I am not sure if you already implemented but not stated, you might want to consider an alert in case of some out of the ordinary gap between standby and primary which you may have to address manually when it happens. In our case a gap of 10 minutes is unacceptable we ingest data constantly in primary. No matter how much we automate and make it fault tolerant we still need to have closed loop mechanism. Rsync can fail anytime and you don't want it go unnoticed. My 2c. -----Original Message----- From: pgsql-admin-owner@xxxxxxxxxxxxxx [mailto:pgsql-admin-owner@xxxxxxxxxxxxxx] On Behalf Of Andrew W. Gibbs Sent: Tuesday, April 08, 2014 8:56 AM To: pgsql-admin@xxxxxxxxxxxxxx Subject: robust archiving of WAL segments I have a Postgres 8.4.X database that is subject to a fairly high ingest rate. It is not uncommon for the system to generate 1-3 WAL segments of 16M in size every minute. I endeavor to keep these WAL segments around both for maintaining standby servers as well as for disaster recovery. I have over time ended up with a server architecture that looks like this: * master database has an archive_command that copies WAL segments out of pg_xlog and puts them in another local directory that we'll call wal_segment_staging_area * there is a daemon that shovels files out of wal_segment_staging_area using rsync and places them on a separate file server in a directory we'll call wal_segment_archive * there are standby servers in continuous recovery mode whose restore_command sources WAL segments from the file server's wal_segment_archive directory The major considerations that governed this design are: * the high rate of inflow generating a large number of segments quickly * the desire not to have the WAL segments hosted on the master database server long-term due to performance and fault-tolerance reasons * the fear of having an archive_command require network connectivity to another server, the connectivity getting severed, and pg_xlog getting filled and wedging the master server * the problem that if you get substantially behind due to a transient networking problem you want to be able to catch up as quickly as possible, which means among other things that you don't want to be opening up a separate remote connection to transfer every segment in a loop, hence rsync Over the past couple of years that this arrangement has been in operation I have had on a handful of occasions occur what I believe to be the following series of events: * transfer operation of a WAL segment occurs, either from pg_xlog to wal_segment_staging_area or from wal_segment_staging_area to wal_segment_archive * the process transferring the file believes that the operation has succeeded and deletes its local copy of the file * the server that received the file experiences a transient fault that causes it to wedge and an administrator does a hard reset out of desperation * when the receiving server comes back up, the file that was created during the archive operation is there, but it is zero-length, presumably indicating that from the standpoint of an application the write operation had succeeded, but from the standpoint of caching the file's contents were not durably sync'd to disk * when this has occurred to the file being placed in wal_segment_staging_area, a subsequent power-on of the master database has yielded repeated error messages in the log files saying that the archive_command had failed (because, following the Postgres docs' advice, the archive_command does a test to see if the file already exists), and eventually Postgres says "I give up trying to archive this WAL segment!" and deletes it from pg_xlog without it being archived, which blew my mind the first time I saw it happen because I thought the archive_command was supposed to be retried indefinitely until success was reported * when this has occurred to the file being placed in wal_segment_archive, the effect is similar, although you don't get any error messages in Postgres, just an archive that contains some zero-length WAL segments * in both cases, you end up with a WAL segment stream that is corrupted, such that until you have made a new base backup if you were to have to perform a point-in-time restore operation you would only be able to restore up until the point of the event because a subset of the WAL is lost forever Thoughts and questions: * it seems that robust archiving of WAL segments requires that your script be as pedantic about transferring the files as Postgres is about creating them; Postgres doesn't report a transaction as having closed until the relevant WAL data has been durably written to disk, and your archiving scripts cannot afford to delete the source file until the destination file has likewise been durably written * is it a bug that Postgres gives up trying to archive a WAL segment after several failed tries? or have I somehow misconfigured something? maybe after the fashion of implementing a restore_command for continuous recovery mode, i.e. implementing a restore_command that does not return until a file has been restored, I ought likewise implement an archive_command that does not return until the file has been successfully archived? * the Postgres docs chapter "Continuous Archiving and Point-in-Time Recovery" provides a lot of good information about rigging up an environment for dealing with all this stuff, but it does not to my knowledge give any treatment to such matters as I describe herein, and that may have yielded a somewhat precarious situation for a lot of installations; there probably ought be a treatment of ensuring that transmitted files have been written durably * I'm not even sure what a good solution is for guaranteeing that transmitted files have been durably persisted using common tools; it doesn't seem that commonly available rsync implementations support a "please call fsync" option, though some Googling yields discussion of a patched version that someone created for such purposes; maybe I could invoke the shell command "sync" as part of the dance, but that doesn't seem that great either, since the first transfer is happening on the master database and I don't want to issue a sync request to all file systems as that will kill database performance, and the second transfer is happening via rsync and you wouldn't be able to call "sync" until "rsync" had already deleted the source files, thus creating a race condition * I can imagine what full solutions would look like, but not ones that don't involve a fair amount of custom code to solve what feels like it ought be a solved problem How have you folks dealt with needing this level of robustness? -- AWG -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin