Re: Minimizing Recovery Time (wal replication)

Bryan Murphy <bmurphy1976@xxxxxxxxx> · Thu, 9 Apr 2009 18:58:55 -0500

On Thu, Apr 9, 2009 at 6:38 PM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote:
> What does vmstat say about the bi/bo during this time period?  It sounds
> like the volume of random I/O produced by recovery is just backing up as
> expected.  Some quick math:

I'll have to capture this, unfortunately I won't be able to do that
until tomorrow.  This machine I was looking at is already failed over
and I'm currently creating a new snapshot.  I won't have a new hot
spare to replace it until the morning.

> 15GB RAM * 5% dirty_ratio = 750MB ; there's where your measured 760MB
> bottleneck is coming from.

That was what I thought, good to have it confirmed by somebody else.

> 750MB / 10 minutes = 1.25MB/s ; that's in the normal range for random writes
> with a single disk

Yes, this is an interesting problem I'm having, more on it below...

> Therefore my bet is that "vmstat 1" will show bo~=1250 the whole time you're
> waiting there, with matching figures from the iostat to the database disk
> during that period.
>
> Basically your options here are:
>
> 1) Decrease the maximum possible segment backlog so you can never get this
>   far behind

I understand conceptually what you are saying, but I don't know how to
practically realize this. :)  Do you mean lower checkpoint_segments?

> 2) Increase the rate at which random I/O can be flushed to disk by either
>   a) Improving things with a [better] battery-backed controller disk cache
>   b) Stripe across more disks

This is the problem that has been my nightmare for the past few
months.  It actually is an 8 drive raid 10, BUT, it's on virtualized
infrastructure up in Amazon's cloud running on 8 EBS volumes.  I've
found performance to be... inconsistent at best.  Sometimes it's
great, sometimes it's not so great.

We have a legacy database (~120gb) which grew in our old data center
on very powerful hardware.  We moved it up to Amazon's cloud a few
months ago, and have been scrambling ever since.

I wouldn't change what we're doing, the benefits so far have
outweighed the pain, and we're actively working on the software to
make better use of the cloud infrastructure (i.e. many smaller
databases instead of one big database, lots of caching, the usual
stuff).  Unfortunately, that takes time and I'm trying to limp along
as best I can with the legacy database until we can get everything
migrated.

So, to recap, I've raided up the volumes, thrown as much RAM and CPU
at the process as is available and just can't seem to tease any more
performance out.

Thanks,
Bryan

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general