Re: Minimizing Recovery Time (wal replication)

Greg Smith <gsmith@xxxxxxxxxxxxx> · Thu, 9 Apr 2009 19:38:15 -0400 (EDT)

On Thu, 9 Apr 2009, Bryan Murphy wrote:

(1) hot spare applies 70 to 75 wal files (~1.1g) in 2 to 3 min period

Yeah, if you ever let this many files queue up you're facing a long 
recovery time.  You really need to get into a position where you're 
applying WAL files regularly enough that you don't ever fall this far 
behind.

(2) hot spare pauses for 15 to 20 minutes, during this period pdflush
consumes 99% IO (iotop).  Dirty (from /proc/meminfo) spikes to ~760mb,
remains at that level for the first 10 minutes, and then slowly ticks
down to 0 for the second 10 minutes.

What does vmstat say about the bi/bo during this time period?  It sounds 
like the volume of random I/O produced by recovery is just backing up as 
expected.  Some quick math:

15GB RAM * 5% dirty_ratio = 750MB ; there's where your measured 760MB 
bottleneck is coming from.

750MB / 10 minutes = 1.25MB/s ; that's in the normal range for random 
writes with a single disk

Therefore my bet is that "vmstat 1" will show bo~=1250 the whole time 
you're waiting there, with matching figures from the iostat to the 
database disk during that period.

Basically your options here are:

1) Decrease the maximum possible segment backlog so you can never get this
   far behind
2) Increase the rate at which random I/O can be flushed to disk by either
   a) Improving things with a [better] battery-backed controller disk cache
   b) Stripe across more disks

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general