Re: General performance/load issue

Gaëtan Allart <gaetan@xxxxxxxxxxx> · Sat, 26 Nov 2011 17:08:40 +0000

Uhm…

I'm seeing dozens and dozens of temporary file creations in logs :

LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1425", size 25340
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.195", size
2720340
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3495", size 24724
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2674", size
2712452
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1850", size 25284
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.822", size
2717464
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1050", size 25060
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.501", size
2700248
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3768", size 23156
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2132", size
2713204
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1886", size 23744
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.311", size
2829600
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3849", size 25088
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2352", size
2770352
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1267", size 25592
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.765", size
2803744
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3775", size 24444
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2686", size
2858836
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1094", size 24948
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.106", size
2800140
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1399", size 23912
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.211", size
2761788
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3747", size 27188
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2639", size
2749672
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1053", size 24276
LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp15388.452", size
2948712

Does this help ?

Gaëtan

Le 26/11/11 17:47, « Gaëtan Allart » <gaetan@xxxxxxxxxxx> a écrit :

>Rahh :/
>
>It's getting worse and worse :/ Database has to be restarted every 2 hours
>causing much traffic loss :/
>
>As far as the server is concerned, it was running great 7 days ago and had
>been running like this for months. I really don't get why it suddenly went
>"I/oing"Š
>
>
>Here's the current postgresql.conf :
>
>shared_buffers = 6GB   # min 128kB
>     # (change requires restart)
>temp_buffers = 40MB   # min 800kB
>     # (change requires restart)
>work_mem = 96MB    # min 64k
>maintenance_work_mem = 256MB  # min 1MB
>max_stack_depth = 2MB   # min 100kB
>
>effective_cache_size = 32GB
>
>
>max_files_per_process = 8192  # min 25
>     # (change requires restart)
>
>fsync = on   # turns forced synchronization on or offf
>synchronous_commit = off  # immediate fsync at commit
>     # supported by the operating system:
>wal_buffers = 16MB   # min 32kB
>
>
>checkpoint_segments = 40  # in logfile segments, min 1, 16MB
>eachcheckpoint_timeout = 5min  # range 30s-1h
>checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0
>
>
>
>
>seq_page_cost = 1.0   # measured on an arbitrary scale
>random_page_cost = 2.0   # same scale as above
>
>
>
>
>
>Isn't there anything I can do to keep my database Up&running even with bad
>performance?
>
>Filesystem is ext3. Running over a hardware RAID-1 config.
>
>
>Gaëtan
>
>Le 26/11/11 15:12, « Tomas Vondra » <tv@xxxxxxxx> a écrit :
>
>>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote:
>>> A better view of iotop :
>>>
>>>   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
>>> 31875 be/4 postgres    0.00 B/s   15.23 M/s  0.00 %  0.00 % postgres:
>>> database database 46.105.104.205(50228) SELECT
>>> 30985 be/4 postgres    0.00 B/s   10.55 M/s  0.00 %  0.00 % postgres:
>>> database database 46.105.104.205(47672) SELECT
>>>
>>> As you can see, SELECTS are writing very much on the disk.
>>>
>>> At this moment, pg_stat_activity show many many queries running at the
>>> same time.
>>> It looks like that all queries are suddenly writing on disk and not a
>>> particular one, making me think of a buffer issue or something.
>>
>>No, I don't think this is an issue with the size of shared buffers.
>>That'd
>>influence the pg_stat_bgwriter - the buffers_backend would grow much
>>faster, and it's not the case.
>>
>>So I'm guessing it's either hint bits or sorting. If I had to, I'd
>>probably guess about the hint bits - see for example this thread
>>
>>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php
>>
>>it more or less behaves like what you described so far.
>>
>>> This is a example of 10 MB/s writing query :
>>>
>>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id",
>>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id",
>>> "table"."url_article" AS "url_article", "table"."original_url" AS
>>> "original_url", "table"."name" AS "name", "table"."description" AS
>>> "description", "table"."content" AS "content", "table"."permis" AS
>>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS
>>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS
>>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS
>>> "add_date", "table"."comments_open" AS "comments_open",
>>>"table"."site_id"
>>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS
>>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes",
>>> "table"."score" AS "score", "arti
>>
>>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post
>>it).
>>
>>> Checkpoints logs still show very long write times :
>>>
>>> LOG:  checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log
>>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s,
>>> total=63.625 s
>>
>>No, that's fine - that's what spread checkpoints do. Once the
>>checkpoint_timeout expires, the system decides to perform a checkpoint,
>>i.e. it has to write all dirty (modified) shared buffers to the disk. But
>>it knows the next checkpoint will happen in checkpoint_timeout, so it has
>>about 5 minutes to write all the data.
>>
>>So it says something like 'I need to write 540MB want to write that in
>>270
>>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the
>>write phase is expected to take long. But it has to sync the data at the
>>end, and that's where the problems usually happen - so the important
>>thing
>>is 'sync' and that improved significantly. 6 seconds is not great but
>>it's
>>not a big issue I guess.
>>
>>> Couldn't this be a hardware issue ?
>>
>>I don't think so. The problem is that the queries write a lot of data,
>>and
>>the SSD can't help in this case. The SSDs provide much better random
>>performance (compared to spinners), but the sequential performance is not
>>much better. So pushing 70MB/s to the SSD may be fully utilized.
>>
>>Have you tested the basic performance (like bonnie++, dd etc) before
>>going
>>into production? There are some common SSD-related mistakes (e.g. not
>>aligning the partitions properly) that may easily cause 30% performance
>>loss. But this won't fix the problem, although you could get a bit better
>>performance.
>>
>>Tomas
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general