Re: General performance/load issue

"Tomas Vondra" <tv@xxxxxxxx> · Fri, 25 Nov 2011 14:12:32 +0100

On 25 Listopad 2011, 12:43, Cédric Villemain wrote:
> Le 25 novembre 2011 11:25, Tomas Vondra <tv@xxxxxxxx> a écrit :
>> On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
>>>
>>> It seem you have an issue with your checkpoint syncing time, it is
>>> fixed in 9.1 and backported in 9.0 here :
>>> http://projects.2ndquadrant.com/backports
>>
>> People generally don't want to apply backports on their own, especially
>> when it's a production server and when it's unclear it actually fixes
>> the
>> issue they have. I'm not sure about that.
>
> I agree that most people don't want to do that themselves, but if it
> happens to be the solution they can proceed or ask someone to do it.
> People want to see their production system back to a normal situation,
> here the limited information are not enought to be sure, but the
> checkpoint sync time are clear: sync time are not correct.
> It is very probable that compacting the fsync will help, but it is not
> sure it is required yet.

Yes, the sync times are quite crazy. Especially given the tiny number of
buffers to be written and the fact that the SSD should handle random I/O
quite well.

>>> It is possible you have other problems that explains the issue you
>>> have. An immediate solution before trying a patch is to reduce your
>>> shared_buffer setting to something very low, like 1GB.
>>
>> Well, using low shared_buffers was used especially before 8.3, when the
>> spread checkpoints were not available. It prevents the I/O overload when
>> the database suddenly decides to write all of the dirty buffers. But
>> he's
>> on 9.0 (so he already has spread checkpoints).
>
> It is a different animal here.
>
>>
>> Plus the number of buffers he's writing is negligible - usually about
>> 700
>> buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
>> problem for the SSD he's using.
>
> See the blog entry from Greg Smith:
>  http://blog.2ndquadrant.com/en/2011/06/backporting-and-checkpoint-tro.html
>
> And the slides of his talk at pgconf2011:
>  http://www.2ndquadrant.com/static/2quad/media/pdfs/talks/WriteStuff-PGCon2011.pdf
>
> I was just pointing that there are known issues in this area, with
> known solutions.

Thanks for the links, interesting stuff. Still, my impression is that the
SSD is stressed by something else, and the high fsync values during a
checkpoint are merely a symptom. So fixing a checkpoint (using the
backpatch) won't actually fix the issue. But I'm just guessing here.

> Getting more information on vacuum activity, bgwriter activity should help
> too.

Yes, that'd be nice. Gaëtan, can you post bgwriter-related options from
postgresql.conf and two snapshosts from pg_stat_bgwriter (say 5 minutes
apart, collected when the db is slow)? A complete 'iotop -o' output would
be nice too.

BTW what filesystem are we talking about? What mount options have you used?

What about the /proc/sys/vm/dirty_expire_centiseconds?

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general