Re: Huge iowait during checkpoint finish

Anton Belyaev <anton.belyaev@xxxxxxxxx> · Mon, 11 Jan 2010 13:50:47 +0300

Hello Greg,

Thanks for you extensive reply.

2010/1/9 Greg Smith <greg@xxxxxxxxxxxxxxx>:
> Anton Belyaev wrote:
>>
>> I think all the IOwait comes during sync time, which is 80 s,
>> according to the log entry.
>>
>
> I believe you are correctly diagnosing the issue.  The "sync time" entry in
> the log was added there specifically to make it easier to confirm this
> problem you're having exists on a given system.
>
>> bgwriter_lru_maxpages = 0 # BG writer is off
>> checkpoint_segments = 45
>> checkpoint_timeout = 60min
>> checkpoint_completion_target = 0.9
>>
>
> These are reasonable settings.  You can look at pg_stat_bgwriter to get more
> statistics about your checkpoints; grab a snapshot of that now, another one
> later, and then compute the difference between the two.  I've got an example
> of that http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
>
> You should be aiming to have a checkpoint no more than every 5 minutes, and
> on a write-heavy system shooting for closer to every 10 is probably more
> appropriate.  Do you know how often they're happening on yours?  Two
> pg_stat_bgwriter snapshots from a couple of hours apart, with a timestamp on
> each, can be used to figure that out.
>

Checkpoint happens about once an hour, sometimes a bit more offen (30
minutes) - during daily peaks.

>> I had mostly the same config with my 8.3 deployment.
>> But hardware is different:
>> Disk is software RAID-5 with 3 hard drives.
>> Operating system is Ubuntu 9.10 Server x64.
>>
>
> Does the new server have a lot more RAM than the 8.3 one?  Some of the
> problems in this area get worse the more RAM you've got.
>

Yes, new server has 12 GB while old one only 8 GB.

> Does the new server use ext4 while the old one used ext3?
>

Same ext3 filesystem.

> Basically, you have a couple of standard issues here:
>
> 1) You're using RAID-5, which is not known for good write performance.  Are
> you sure the disk array performs well on writes?  And if you didn't
> benchmark it, you can't be sure.
>

I did some dd benchmarks (according to
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm):
Old server with its "hardware RAID-1" shows 60 mb/s on write.
New server with software RAID-5 shows 85 mb/s on write.

> 2) Linux is buffering a lot of writes that are only making it to disk at
> checkpoint time.  This could be simply because of (1)--maybe the disk is
> always overloaded.  But it's possible this is just due to excessive Linux
> buffering being lazy about the writes.  I wrote something about that topic
> at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html you
> might find interesting.
>

Old server has
dirty_ratio = 10
dirty_background_ratio = 5

New server had
dirty_ratio = 20
dirty_background_ratio = 10

Assuming all the tests and measures above:

Server has more RAM, leaving Linux some room for write cache. During
dd test DirtyPages of /proc/meminfo were up to 2 GB.

RAID-5 is a bit faster (at least on sequential write). Drives arent
overloaded, because their utilization during lengthy checkpoint is
low. IOwait problems occur only at final sync part of checkpoint. And
during this short period drives are almost 100% utilized (according to
sar -d 1).

I played a bit, setting dirty_background_ratio = 1, but this had
negative effect somehow.
And this is strange. I hoped this will force to distribute the load
from 2 min sync period to 1 hour checkpoint span, but it did not.

As the result, I am dont know still where is the real problem. Drives
arent overloaded. Linux cache is really misterious, but modifying its
parameters does not give the desired effect.

Thanks.
Anton.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general