Re: General performance/load issue

"Tomas Vondra" <tv@xxxxxxxx> · Fri, 25 Nov 2011 11:25:19 +0100

On 24 Listopad 2011, 23:19, Cédric Villemain wrote:
> Le 24 novembre 2011 20:38, Gaëtan Allart <gaetan@xxxxxxxxxxx> a écrit :
>> Finally, it "crashed" againŠ :-(
>>
>> Here's the output of iotop while databased was inaccessible :
>>
>> 32361 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34847) idle
>> 32244 be/4 postgres  163.48 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34660) SELECT
>> 32045 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(33765) SELECT
>> 32158 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34112) SELECT
>> 32242 be/4 postgres    7.78 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(34632) SELECT
>> 32372 be/4 postgres    0.00 B/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38858) idle in transaction
>> 32231 be/4 postgres   15.57 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38602) SELECT
>> 28811 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(40594) SELECT
>> 32190 be/4 postgres    3.89 K/s    0.00 B/s  0.00 % 99.99 % postgres:
>> mydb
>> mydb host(38497) SELECT

This iotop output is rather incomplete - it does not show which of the
processes actually did the I/O. The fact that a process does 99% of the
I/O and yet performs no reads or writes is a bit strange.

Use "iotop -o" and post the result.

>> Does this help?
>
> yes.
> It seem you have an issue with your checkpoint syncing time, it is
> fixed in 9.1 and backported in 9.0 here :
> http://projects.2ndquadrant.com/backports

People generally don't want to apply backports on their own, especially
when it's a production server and when it's unclear it actually fixes the
issue they have. I'm not sure about that.

> It is possible you have other problems that explains the issue you
> have. An immediate solution before trying a patch is to reduce your
> shared_buffer setting to something very low, like 1GB.

Well, using low shared_buffers was used especially before 8.3, when the
spread checkpoints were not available. It prevents the I/O overload when
the database suddenly decides to write all of the dirty buffers. But he's
on 9.0 (so he already has spread checkpoints).

Plus the number of buffers he's writing is negligible - usually about 700
buffers (6MB), 3192 buffers (25MB) at most. That surely should not be a
problem for the SSD he's using.

Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general