Hi Robert, Thanks for your help as well. You're right about checkpoints, it's running pretty good at start then encounter heavy i/os. I've changed theses settings and also reduced work_mem a little and reduced effective_cache_size btw. LOG: parameter "work_mem" changed to "96MB" LOG: parameter "effective_cache_size" changed to "24GB" LOG: parameter "checkpoint_segments" changed to "40" LOG: parameter "checkpoint_timeout" changed to "5min" Apparently, it's been running fine since I made the first changes recommenced by Tomas. Let's wait for a couple of hours again to confirm this. Gaëtan Le 24/11/11 16:39, « Robert Treat » <rob@xxxxxxxxxx> a écrit : >On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@xxxxxxxx> wrote: >> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote: >>> Hello everyone, >>> >>> I'm having some troubles with a Postgresql server. >>> We're using PG has a database backend for a very big website (lots of >>>data >>> and much traffic). >>> >>> The issue : server suddenly (1H after restart) becomes slow (queries >>>not >>> responding), load rises (>20 instead of 1), iowait rises (20 to 70%) >>> >>> Version : 9.0.5 >>> Server : Dual Xeon X5650 (24 cores total) >>> Memory : 48 GB >>> Disks : SSD >>> >>> >>> Top when overloaded : >> >> Top is not the most useful tool here, I guess. Use "iotop" (will show >>you >> which processes are doing the I/O) and tools like vmstat / iostat. >> >>> Postgresql.conf : >>> >>> max_connections = 50 >>> shared_buffers = 12G >>> temp_buffers = 40MB >>> work_mem = 128MB >>> maintenance_work_mem = 256MB >>> max_files_per_process = 8192 >>> checkpoint_segments = 256 >>> checkpoint_timeout = 30min >>> checkpoint_completion_target = 0.9 >> >> Fine. Let's see the options that look suspicious. >> > >I think you missed some suspicious settings... I'd recommend setting >shared buffers to 8gb, and I'd likely reduce checkpoint segements to >30 and set the checkpoint timeout back to 5 minutes. Everything about >the way this server is configured (including those vm settings) is >pushing it towards delaying the WAL/Buffer/Checkpoint as long as >possible, which matches with the idea of good performance initial >followed by a period of poor performance and heavy i/o. > >On a side note, I'd guess your work_mem is probably too high. 50 >(connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM, >which is 25% of total ram on the box. That doesn't necessarily mean >game over, but it seem like it wouldn't be that hard to get thrashing >being set up that way. YMMV. > >Robert Treat >conjecture: xzilla.net >consulting: omniti.com -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general