Tomas, I've enabled logging of checkpoints. I'm waiting for the next i/o crisisŠ Gaëtan Le 24/11/11 17:02, « Tomas Vondra » <tv@xxxxxxxx> a écrit : >On 24 Listopad 2011, 16:39, Robert Treat wrote: >> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@xxxxxxxx> wrote: >>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote: >>>> Postgresql.conf : >>>> >>>> max_connections = 50 >>>> shared_buffers = 12G >>>> temp_buffers = 40MB >>>> work_mem = 128MB >>>> maintenance_work_mem = 256MB >>>> max_files_per_process = 8192 >>>> checkpoint_segments = 256 >>>> checkpoint_timeout = 30min >>>> checkpoint_completion_target = 0.9 >>> >>> Fine. Let's see the options that look suspicious. >>> >> >> I think you missed some suspicious settings... I'd recommend setting >> shared buffers to 8gb, and I'd likely reduce checkpoint segements to >> 30 and set the checkpoint timeout back to 5 minutes. Everything about >> the way this server is configured (including those vm settings) is >> pushing it towards delaying the WAL/Buffer/Checkpoint as long as >> possible, which matches with the idea of good performance initial >> followed by a period of poor performance and heavy i/o. > >Yes, checkpoints were my first thought too. OTOH the OP reported that most >of the I/O is caused by WAL writer - that's not exactly the part that does >the work during checkpoint. Plus the WAL may not be postponed, as it's >usually O_DIRECT and fsynced, right. > >You're right that the writes are postponed, but I generally see that as a >good thing when combined with spread checkpoints. And even with those vm >settings (about 3.2GB for background writes), I wouldn't expect this >behaviour (because the page cache usually expires after 30 seconds). Say >you need 100% of the shared buffers is dirty and need to be written. You >have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30 >seconds expire there might be about 240MB before the pdflush starts to >write the data to the SSD. And that can surely handle more than 50MB/s. So >why the long delay? The question is what else is going on there. > >But all this is just guessing - I want to see the log_checkpoint message, >iostat results etc. > >> On a side note, I'd guess your work_mem is probably too high. 50 >> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM, >> which is 25% of total ram on the box. That doesn't necessarily mean >> game over, but it seem like it wouldn't be that hard to get thrashing >> being set up that way. YMMV. > >This is one of the reasons why effective_cache_size should be lower than >32GB, probably ... > >Tomas -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general