As others suggested having shared_buffers = 48GB is to large. You should never need to go above 8GB. I have a similar server and mine has
shared_buffers = 8GB
checkpoint_completion_target = 0.9
This looks like a problem of dirty memory being flushed to the disk. You should set your monitoring to monitor dirty memory from /proc/meminfo and check if it has any correlation with the slowdowns. Also vm.dirty_background_bytes should always be a fraction of vm.dirty_bytes, since when there is more than vm.dirty_bytes bytes dirty it will stop all writing to the disk until it flushes everything, while when it reaches the vm.dirty_background_bytes it will slowly start flushing those pages to the disk. As far as I remember vm.dirty_bytes should be configured to be a little less than the cache size of your RAID controller, while vm.dirty_background_bytes should be 4 times smaller.shared_buffers = 8GB
checkpoint_completion_target = 0.9
Strahinja Kustudić | System Engineer | Nordeus
On Wed, Feb 6, 2013 at 10:12 PM, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:
Johnny Tan <johnnydtan@xxxxxxxxx> wrote:Spread checkpoints made the issue less severe, but on servers with
> Wouldn't this be controlled by our checkpoint settings, though?
a lot of RAM I've had to make the above changes (or even go lower
with shared_buffers) to prevent a burst of writes from overwhelming
the RAID controllers battery-backed cache. There may be other
things which could cause these symptoms, so I'm not certain that
this will help; but I have seen this as the cause and seen the
suggested changes help.
-Kevin
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance