Search Postgresql Archives

Re: WAL archiving to network drive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 20 Aug 2008, Glen Parker wrote:

Greg Smith wrote:
2) If there is a problem with the network drive, the slow/blocked network write is not happening where the database is waiting for it.

This is not really a consideration, is it? I was under the impression that no other database activity blocks waiting for the archiver.

The database will continue accumulating WAL segments it can't recycle if the archiver keeps failing, which can cause the size of the pg_xlog directory (often mounted into a separate, smaller partition or disk) to increase dramatically. You do not want to be the guy who caused the database to go down because the xlog disk filled after some network mount flaked out. I've seen that way too many times in WAN environments where the remote location was unreachable for days, due to natural disaster for example, and since under normal operation pg_xlog never got very big it wasn't sized for that.

It will also slow things down a bit under heavy write loads, as every segment change will result in creating a new segment file rather than re-using an old one.

You also don't want to be the guy who has to explain why the database is taking hours to come back up again after it crashed and has 4000 WAL segments to replay, because archiving failed for a long time and prevented proper checkpoints (ask Robert Treat if you don't believe me, he also once was that guy). While occasional archiving failures are tolerated just fine and you can catchup later, a design that presumes it's OK for them to happen for extended periods is not optimal. A crash after an excess of segments have accumulated can result in a large amount of replay downtime for the server, and in some environments (where the primary is very redundant for example) the risk of that is worse than saying your PITR backup is hosed and just throwing away the segments if you can't transfer them to the backup for some reason.

3) In an emergency where the usual network drive is unavailable, you can temporarily change your archive strategy without touching the database server's configuration.

That's just as true with the direct write method. You simply change the archive script to do nothing and return a failed exit status. The DBMS will keep trying until you change it back, and then everything catches up and proceeds as normal.

I've already been through why assuming archive failures have zero cost can be bad above. There are some additional issues here as well.

When's the last time you tried to get a DBA to send a "kill -HUP" to a production database to get it to recognize that archive_command was changed in the postgresql.conf? Sure, we all know it's not supposed to ever hurt the server, but Murphy is a tricky character and lots of people are (rightly) paranoid. I think we've all known admins whose fingers were trained to only ever type "kill -9" no matter what signal they meant to send.

OK, maybe you're smarter than that and used a separate script. DBAs are also not happy changing a script that gets called by the database every couple of minutes, and as soon as there's more than one piece involved it can be difficult to do an atomic update of said script.

Anytime you can decouple a failure recovery cleanup (like a network drive getting disconnected screwing up archiving) completely away from the routine database operation, you have gained flexibility in how easily and safely you can recover from that failure. I assure you that all the suggestions I made come from painful and completely unexpected messes

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux