Le 24 novembre 2011 17:02, Tomas Vondra <tv@xxxxxxxx> a écrit : > On 24 Listopad 2011, 16:39, Robert Treat wrote: >> On Thu, Nov 24, 2011 at 9:09 AM, Tomas Vondra <tv@xxxxxxxx> wrote: >>> On 24 Listopad 2011, 14:51, Gaëtan Allart wrote: >>>> Postgresql.conf : >>>> >>>> max_connections = 50 >>>> shared_buffers = 12G >>>> temp_buffers = 40MB >>>> work_mem = 128MB >>>> maintenance_work_mem = 256MB >>>> max_files_per_process = 8192 >>>> checkpoint_segments = 256 >>>> checkpoint_timeout = 30min >>>> checkpoint_completion_target = 0.9 >>> >>> Fine. Let's see the options that look suspicious. >>> >> >> I think you missed some suspicious settings... I'd recommend setting >> shared buffers to 8gb, and I'd likely reduce checkpoint segements to >> 30 and set the checkpoint timeout back to 5 minutes. Everything about >> the way this server is configured (including those vm settings) is >> pushing it towards delaying the WAL/Buffer/Checkpoint as long as >> possible, which matches with the idea of good performance initial >> followed by a period of poor performance and heavy i/o. > > Yes, checkpoints were my first thought too. OTOH the OP reported that most > of the I/O is caused by WAL writer - that's not exactly the part that does > the work during checkpoint. Plus the WAL may not be postponed, as it's > usually O_DIRECT and fsynced, right. > > You're right that the writes are postponed, but I generally see that as a > good thing when combined with spread checkpoints. And even with those vm > settings (about 3.2GB for background writes), I wouldn't expect this > behaviour (because the page cache usually expires after 30 seconds). Say > you need 100% of the shared buffers is dirty and need to be written. You > have 27 minutes (30*0.9) to do that - that means about 8MB/s. With 30 > seconds expire there might be about 240MB before the pdflush starts to > write the data to the SSD. And that can surely handle more than 50MB/s. So > why the long delay? The question is what else is going on there. > > But all this is just guessing - I want to see the log_checkpoint message, > iostat results etc. > >> On a side note, I'd guess your work_mem is probably too high. 50 >> (connections) x 128 (mb work mem) x 2 (sorts per query) = 12GB RAM, >> which is 25% of total ram on the box. That doesn't necessarily mean >> game over, but it seem like it wouldn't be that hard to get thrashing >> being set up that way. YMMV. > > This is one of the reasons why effective_cache_size should be lower than > 32GB, probably ... according to 'free' output, 38GB is what is here right now. effective_cache_size is just informative, so you can put it to 1TB without memory issue. And, it is OS cache+PG cache. There is not enougth information yet to be sure on what's happening. log_checkpoint output will help for sure. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general