Re: General performance/load issue

Gaëtan Allart <gaetan@xxxxxxxxxxx> · Sat, 26 Nov 2011 16:47:03 +0000

Rahh :/

It's getting worse and worse :/ Database has to be restarted every 2 hours
causing much traffic loss :/

As far as the server is concerned, it was running great 7 days ago and had
been running like this for months. I really don't get why it suddenly went
"I/oing"Š

Here's the current postgresql.conf :

shared_buffers = 6GB   # min 128kB
     # (change requires restart)
temp_buffers = 40MB   # min 800kB
     # (change requires restart)
work_mem = 96MB    # min 64k
maintenance_work_mem = 256MB  # min 1MB
max_stack_depth = 2MB   # min 100kB

effective_cache_size = 32GB

max_files_per_process = 8192  # min 25
     # (change requires restart)

fsync = on   # turns forced synchronization on or offf
synchronous_commit = off  # immediate fsync at commit
     # supported by the operating system:
wal_buffers = 16MB   # min 32kB

checkpoint_segments = 40  # in logfile segments, min 1, 16MB
eachcheckpoint_timeout = 5min  # range 30s-1h
checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0

seq_page_cost = 1.0   # measured on an arbitrary scale
random_page_cost = 2.0   # same scale as above

Isn't there anything I can do to keep my database Up&running even with bad
performance?

Filesystem is ext3. Running over a hardware RAID-1 config.

Gaëtan

Le 26/11/11 15:12, « Tomas Vondra » <tv@xxxxxxxx> a écrit :

>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
>> A better view of iotop :
>>
>>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
>> 31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
>> database database 46.105.104.205(50228) SELECT
>> 30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
>> database database 46.105.104.205(47672) SELECT
>>
>> As you can see, SELECTS are writing very much on the disk.
>>
>> At this moment, pg_stat_activity show many many queries running at the
>> same time.
>> It looks like that all queries are suddenly writing on disk and not a
>> particular one, making me think of a buffer issue or something.
>
>No, I don't think this is an issue with the size of shared buffers. That'd
>influence the pg_stat_bgwriter - the buffers_backend would grow much
>faster, and it's not the case.
>
>So I'm guessing it's either hint bits or sorting. If I had to, I'd
>probably guess about the hint bits - see for example this thread
>
>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php
>
>it more or less behaves like what you described so far.
>
>> This is a example of 10 MB/s writing query :
>>
>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
>> "table"."url_article" AS "url_article", "table"."original_url" AS
>> "original_url", "table"."name" AS "name", "table"."description" AS
>> "description", "table"."content" AS "content", "table"."permis" AS
>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
>> "add_date", "table"."comments_open" AS "comments_open",
>>"table"."site_id"
>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
>> "table"."score" AS "score", "arti
>
>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
>it).
>
>> Checkpoints logs still show very long write times :
>>
>> LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
>> total=63.625 s
>
>No, that's fine - that's what spread checkpoints do. Once the
>checkpoint_timeout expires, the system decides to perform a checkpoint,
>i.e. it has to write all dirty (modified) shared buffers to the disk. But
>it knows the next checkpoint will happen in checkpoint_timeout, so it has
>about 5 minutes to write all the data.
>
>So it says something like 'I need to write 540MB want to write that in 270
>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
>write phase is expected to take long. But it has to sync the data at the
>end, and that's where the problems usually happen - so the important thing
>is 'sync' and that improved significantly. 6 seconds is not great but it's
>not a big issue I guess.
>
>> Couldn't this be a hardware issue ?
>
>I don't think so. The problem is that the queries write a lot of data, and
>the SSD can't help in this case. The SSDs provide much better random
>performance (compared to spinners), but the sequential performance is not
>much better. So pushing 70MB/s to the SSD may be fully utilized.
>
>Have you tested the basic performance (like bonnie++, dd etc) before going
>into production? There are some common SSD-related mistakes (e.g. not
>aligning the partitions properly) that may easily cause 30% performance
>loss. But this won't fix the problem, although you could get a bit better
>performance.
>
>Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general