Re: WAL archiving to network drive

Greg Smith <gsmith@xxxxxxxxxxxxx> · Wed, 20 Aug 2008 16:03:18 -0400 (EDT)

On Wed, 20 Aug 2008, Glen Parker wrote:

Greg Smith wrote:
2) If there is a problem with the network drive, the slow/blocked network 
write is not happening where the database is waiting for it.

This is not really a consideration, is it?  I was under the impression that 
no other database activity blocks waiting for the archiver.

The database will continue accumulating WAL segments it can't recycle if 
the archiver keeps failing, which can cause the size of the pg_xlog 
directory (often mounted into a separate, smaller partition or disk) to 
increase dramatically.  You do not want to be the guy who caused the 
database to go down because the xlog disk filled after some network mount 
flaked out.  I've seen that way too many times in WAN environments where 
the remote location was unreachable for days, due to natural disaster for 
example, and since under normal operation pg_xlog never got very big it 
wasn't sized for that.

It will also slow things down a bit under heavy write loads, as every 
segment change will result in creating a new segment file rather than 
re-using an old one.

You also don't want to be the guy who has to explain why the database is 
taking hours to come back up again after it crashed and has 4000 WAL 
segments to replay, because archiving failed for a long time and prevented 
proper checkpoints (ask Robert Treat if you don't believe me, he also once 
was that guy).  While occasional archiving failures are tolerated just 
fine and you can catchup later, a design that presumes it's OK for them to 
happen for extended periods is not optimal.  A crash after an excess of 
segments have accumulated can result in a large amount of replay downtime 
for the server, and in some environments (where the primary is very 
redundant for example) the risk of that is worse than saying your PITR 
backup is hosed and just throwing away the segments if you can't transfer 
them to the backup for some reason.

3) In an emergency where the usual network drive is unavailable, you can 
temporarily change your archive strategy without touching the database 
server's configuration.

That's just as true with the direct write method.  You simply change the 
archive script to do nothing and return a failed exit status.  The DBMS will 
keep trying until you change it back, and then everything catches up and 
proceeds as normal.

I've already been through why assuming archive failures have zero cost can 
be bad above.  There are some additional issues here as well.

When's the last time you tried to get a DBA to send a "kill -HUP" to a 
production database to get it to recognize that archive_command was 
changed in the postgresql.conf?  Sure, we all know it's not supposed to 
ever hurt the server, but Murphy is a tricky character and lots of people 
are (rightly) paranoid.  I think we've all known admins whose fingers were 
trained to only ever type "kill -9" no matter what signal they meant to 
send.

OK, maybe you're smarter than that and used a separate script.  DBAs are 
also not happy changing a script that gets called by the database every 
couple of minutes, and as soon as there's more than one piece involved it 
can be difficult to do an atomic update of said script.

Anytime you can decouple a failure recovery cleanup (like a network drive 
getting disconnected screwing up archiving) completely away from the 
routine database operation, you have gained flexibility in how easily and 
safely you can recover from that failure.  I assure you that all the 
suggestions I made come from painful and completely unexpected messes

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD