> "Vaughn, Adam (IMS)" <VaughnA@xxxxxxxxxx> wrote: > > I made all of the changes you mentioned except for the > shared_buffers (which will require a downtime I have set for > tonight). I do have another question though, why did you pick 512 > MB for the new setting of shared_buffers? Everything I've ever > read says that 25% of available RAM is a conservative value for > shared_buffers. Well, in general 25% may be the best for overall *throughput*, but it can often lead to latency spikes, so it depends on what you care about. The curve of throughput against shared_buffers has gotten pretty close to horizontal by around 1GB in a lot of my tests. It doesn't, after all, reduce the size of your cache; it affects how much of the cache access requires an OS call versus staying within PostgreSQL functions. > Also, we had another one of these instances earlier today. During > the 23 minute commit a single CPU was at 98% and it looked like > all writes were backed up waiting for the commit to finalize. What other symptoms did you notice? Did the context switch rate jump? If I remember correctly you are going over the network to your persistent storage -- did you see anything anomalous on the network around that time? Greg Smith has really turned the tuning of this into a science, with incremental adjustments and some specific monitoring techniques. I strongly recommend you look for that on the list archives, in recordings or slides from his presentations on the topice, or his recent book. > During the time our writing never got above 25 MB/s (far less than > we can handle). At that rate it should have been able to write your 6GB shared buffers in about 4.1 minutes. What did the disk writes *average* during the incident? 6GB in 23 minutes would be 4.5MB per second. If your average falls anywhere near that number, you have evidence that the buffers were mostly dirty for some reason and the checkpoint glutted on writing them. > Is it possible that we're missing an index somewhere or there's > something else going on? A missing index would probably show high disk reads from sequential passing large tables. A context switch storm or network problems seem like the two most likely causes to me, but it would pay to monitor anything you can to help pin down the bottleneck next time this happens. -Kevin -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin