Re: PG engine takeover or switch over

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hannes Dorbath wrote:
On 15.08.2007 21:30, Medi Montaseri wrote:
I am looking for some suggestions to optimize the following
problem/solution.

Given two nodes A and B (two computers) in a active-passive mode where A is
running PG engine,  and B is simply standing by,  and  a common storage
(twin tailed) ( or from pg_ctl point of view -D /common/data ), I am looking for a faster solution during the takeover where A has crashed and B is to
start PG engine and run with it.

My current solution is to start PG engine which should take little time to study the configuration files and /common/data and fork a few childrens. But
I am still interested in optimizing this start-up cost.

For example, would I gain anything by starting PG engine on both A and B, but on B I point it to /common/dummy and during the takeover, I somehow tell it to now read from /common/data, for example have two postmaster.conf or
PGDATA and then use  pg_ctl reload.

Starting up PostgreSQL should be very fast, given no recovery to be done and decent hardware.

PostgreSQL does not fork a lot unless it is accepting new connections and if reading a config file is slow on your system, something else is broken.

In a active/passive setup your should be able to switch over in under 3 seconds. If there was a lot of load on the failed node the recovery times on the new active node increase. The only thing you can do about that is getting faster disks..

Your startup time in the event of a failure will be predicated on the number of WAL files that need to be played back in order to perform auto-recovery. For example, if you've set your checkpoint_segments to some high number, PostgreSQL will need to play back those WAL files to ensure that transactions that were committed to disk are correctly represented in your "on disk" object data (tables, indexes, etc).

Note that in the case of a "graceful" shutdown, you'll find that PostgreSQL doesn't need to replay WAL files (since it checkpoints prior to shutting down) and as such the startup time is pretty fast.

You can decrease the amount of time it takes to recover by decreasing the number of segments per checkpoint; however, this may result in more frequent checkpointing and as as result a reduction in overall performance on your active node.

Basically, you're balancing the "cost" of auto crash recovery with the frequency of checkpointing.

An alternative (and perhaps faster) method of failover would be to have a cluster in "warm standby" mode (this wouldn't rely on a shared pgdata directory). In such cases you would only have to wait for the last few WAL files to be played back in order to recover. In the case of a high checkpoint_segments number this would allow you to recover quickly from failures, but would introduce PITR overhead (copying WAL files when WAL files become full), but that's a pretty small cost - and you may already have that cost if you do PITR backups. There is no way to have postgresql "switch" data directories to speed up startup.

--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC  27560
919-463-0999/866-229-3386
http://www.otg-nc.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux