Rahh :/ It's getting worse and worse :/ Database has to be restarted every 2 hours causing much traffic loss :/ As far as the server is concerned, it was running great 7 days ago and had been running like this for months. I really don't get why it suddenly went "I/oing"Š Here's the current postgresql.conf : shared_buffers = 6GB # min 128kB # (change requires restart) temp_buffers = 40MB # min 800kB # (change requires restart) work_mem = 96MB # min 64k maintenance_work_mem = 256MB # min 1MB max_stack_depth = 2MB # min 100kB effective_cache_size = 32GB max_files_per_process = 8192 # min 25 # (change requires restart) fsync = on # turns forced synchronization on or offf synchronous_commit = off # immediate fsync at commit # supported by the operating system: wal_buffers = 16MB # min 32kB checkpoint_segments = 40 # in logfile segments, min 1, 16MB eachcheckpoint_timeout = 5min # range 30s-1h checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 seq_page_cost = 1.0 # measured on an arbitrary scale random_page_cost = 2.0 # same scale as above Isn't there anything I can do to keep my database Up&running even with bad performance? Filesystem is ext3. Running over a hardware RAID-1 config. Gaëtan Le 26/11/11 15:12, « Tomas Vondra » <tv@xxxxxxxx> a écrit : >On 26 Listopad 2011, 10:45, Gaëtan Allart wrote: >> A better view of iotop : >> >> TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND >> 31875 be/4 postgres 0.00 B/s 15.23 M/s 0.00 % 0.00 % postgres: >> database database 46.105.104.205(50228) SELECT >> 30985 be/4 postgres 0.00 B/s 10.55 M/s 0.00 % 0.00 % postgres: >> database database 46.105.104.205(47672) SELECT >> >> As you can see, SELECTS are writing very much on the disk. >> >> At this moment, pg_stat_activity show many many queries running at the >> same time. >> It looks like that all queries are suddenly writing on disk and not a >> particular one, making me think of a buffer issue or something. > >No, I don't think this is an issue with the size of shared buffers. That'd >influence the pg_stat_bgwriter - the buffers_backend would grow much >faster, and it's not the case. > >So I'm guessing it's either hint bits or sorting. If I had to, I'd >probably guess about the hint bits - see for example this thread > >http://archives.postgresql.org/pgsql-performance/2008-05/msg00182.php > >it more or less behaves like what you described so far. > >> This is a example of 10 MB/s writing query : >> >> SELECT COUNT(*) FROM (SELECT DISTINCT "table"."id" AS "id", >> "table"."flux_id" AS "flux_id", "table"."locale_id" AS "locale_id", >> "table"."url_article" AS "url_article", "table"."original_url" AS >> "original_url", "table"."name" AS "name", "table"."description" AS >> "description", "table"."content" AS "content", "table"."permis" AS >> "permis", "table"."reviewed" AS "reviewed", "table"."author_id" AS >> "author_id", "table"."poster_id" AS "poster_id", "table"."post_date" AS >> "post_date", "table"."edit_date" AS "edit_date", "table"."add_date" AS >> "add_date", "table"."comments_open" AS "comments_open", >>"table"."site_id" >> AS "site_id", "table"."is_local" AS "is_local", "table"."status" AS >> "status", "table"."visits" AS "visits", "table"."votes" AS "votes", >> "table"."score" AS "score", "arti > >Post EXPLAIN ANALYZE output of this query (use explain.depesz.com to post >it). > >> Checkpoints logs still show very long write times : >> >> LOG: checkpoint complete: wrote 92 buffers (0.0%); 0 transaction log >> file(s) added, 0 removed, 1 recycled; write=49.622 s, sync=6.510 s, >> total=63.625 s > >No, that's fine - that's what spread checkpoints do. Once the >checkpoint_timeout expires, the system decides to perform a checkpoint, >i.e. it has to write all dirty (modified) shared buffers to the disk. But >it knows the next checkpoint will happen in checkpoint_timeout, so it has >about 5 minutes to write all the data. > >So it says something like 'I need to write 540MB want to write that in 270 >seconds (5 minutes * completion_target), so I'll write at 2MB/s'. So the >write phase is expected to take long. But it has to sync the data at the >end, and that's where the problems usually happen - so the important thing >is 'sync' and that improved significantly. 6 seconds is not great but it's >not a big issue I guess. > >> Couldn't this be a hardware issue ? > >I don't think so. The problem is that the queries write a lot of data, and >the SSD can't help in this case. The SSDs provide much better random >performance (compared to spinners), but the sequential performance is not >much better. So pushing 70MB/s to the SSD may be fully utilized. > >Have you tested the basic performance (like bonnie++, dd etc) before going >into production? There are some common SSD-related mistakes (e.g. not >aligning the partitions properly) that may easily cause 30% performance >loss. But this won't fix the problem, although you could get a bit better >performance. > >Tomas -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general