Uhm… I'm seeing dozens and dozens of temporary file creations in logs : LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1425", size 25340 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.195", size 2720340 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3495", size 24724 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2674", size 2712452 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1850", size 25284 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.822", size 2717464 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1050", size 25060 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.501", size 2700248 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3768", size 23156 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2132", size 2713204 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1886", size 23744 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.311", size 2829600 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3849", size 25088 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2352", size 2770352 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1267", size 25592 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.765", size 2803744 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3775", size 24444 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2686", size 2858836 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1094", size 24948 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.106", size 2800140 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1399", size 23912 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.211", size 2761788 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.3747", size 27188 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp13772.2639", size 2749672 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.1053", size 24276 LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp15388.452", size 2948712 Does this help ? Gaëtan Le 26/11/11 17:47, « Gaëtan Allart » <gaetan@xxxxxxxxxxx> a écrit : >Rahh :/ > >It's getting worse and worse :/ Database has to be restarted every 2 hours >causing much traffic loss :/ > >As far as the server is concerned, it was running great 7 days ago and had >been running like this for months. I really don't get why it suddenly went >"I/oing"Š > > >Here's the current postgresql.conf : > >shared_buffers = 6GB # min 128kB > # (change requires restart) >temp_buffers = 40MB # min 800kB > # (change requires restart) >work_mem = 96MB # min 64k >maintenance_work_mem = 256MB # min 1MB >max_stack_depth = 2MB # min 100kB > >effective_cache_size = 32GB > > >max_files_per_process = 8192 # min 25 > # (change requires restart) > >fsync = on # turns forced synchronization on or offf >synchronous_commit = off # immediate fsync at commit > # supported by the operating system: >wal_buffers = 16MB # min 32kB > > >checkpoint_segments = 40 # in logfile segments, min 1, 16MB >eachcheckpoint_timeout = 5min # range 30s-1h >checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 > > > > >seq_page_cost = 1.0 # measured on an arbitrary scale >random_page_cost = 2.0 # same scale as above > > > > > >Isn't there anything I can do to keep my database Up&running even with bad >performance? > >Filesystem is ext3. Running over a hardware RAID-1 config. > > >Gaëtan > >Le 26/11/11 15:12, « Tomas Vondra » <tv@xxxxxxxx> a écrit : > >>On 26 Listopad 2011, 10:45, Gaëtan Allart wrote: >>> A better view of iotop : >>> >>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND >>> 31875 be/4 postgres 0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: >>> database database 46.105.104.205(50228) SELECT >>> 30985 be/4 postgres 0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: >>> database database 46.105.104.205(47672) SELECT >>> >>> As you can see, SELECTS are writing very much on the disk. >>> >>> At this moment, pg_stat_activity show many many queries running at the >>> same time. >>> It looks like that all queries are suddenly writing on disk and not a >>> particular one, making me think of a buffer issue or something. >> >>No, I don't think this is an issue with the size of shared buffers. >>That'd >>influence the pg_stat_bgwriter - the buffers_backend would grow much >>faster, and it's not the case. >> >>So I'm guessing it's either hint bits or sorting. If I had to, I'd >>probably guess about the hint bits - see for example this thread >> >>http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php >> >>it more or less behaves like what you described so far. >> >>> This is a example of 10 MB/s writing query : >>> >>> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id", >>> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id", >>> "table"."url_article" AS "url_article", "table"."original_url" AS >>> "original_url", "table"."name" AS "name", "table"."description" AS >>> "description", "table"."content" AS "content", "table"."permis" AS >>> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS >>> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS >>> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS >>> "add_date", "table"."comments_open" AS "comments_open", >>>"table"."site_id" >>> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS >>> "status", "table"."visits" AS "visits", "table"."votes" AS "votes", >>> "table"."score" AS "score", "arti >> >>Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post >>it). >> >>> Checkpoints logs still show very long write times : >>> >>> LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log >>> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s, >>> total=63.625 s >> >>No, that's fine - that's what spread checkpoints do. Once the >>checkpoint_timeout expires, the system decides to perform a checkpoint, >>i.e. it has to write all dirty (modified) shared buffers to the disk. But >>it knows the next checkpoint will happen in checkpoint_timeout, so it has >>about 5 minutes to write all the data. >> >>So it says something like 'I need to write 540MB want to write that in >>270 >>seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the >>write phase is expected to take long. But it has to sync the data at the >>end, and that's where the problems usually happen - so the important >>thing >>is 'sync' and that improved significantly. 6 seconds is not great but >>it's >>not a big issue I guess. >> >>> Couldn't this be a hardware issue ? >> >>I don't think so. The problem is that the queries write a lot of data, >>and >>the SSD can't help in this case. The SSDs provide much better random >>performance (compared to spinners), but the sequential performance is not >>much better. So pushing 70MB/s to the SSD may be fully utilized. >> >>Have you tested the basic performance (like bonnie++, dd etc) before >>going >>into production? There are some common SSD-related mistakes (e.g. not >>aligning the partitions properly) that may easily cause 30% performance >>loss. But this won't fix the problem, although you could get a bit better >>performance. >> >>Tomas > > >-- >Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general