On Tue, Oct 15, 2013 at 12:07:38PM -0700, Tony Kay wrote: > On Tue, Oct 15, 2013 at 10:26 AM, Julien Cigar <jcigar@xxxxxxxxx> wrote: > > > > > for two reasons: > > > > - some of the overhead of bgwriter and checkpoints is more or less linear > > in the size of shared_buffers, for example it could be possible that a > > large quantity of data could be dirty when a checkpoint occurs). > > > > - the OS cache is also being used for reads and writes, the larger > > shared_buffers is, the more you risk double buffering (same blocks > > in the OS cache and in the database buffer cache). > > > > Excellent. Thank you for the information. My suspicion has always been that > the shared_buffers are "level 1 cache", so it seems to me that you'd want > that to be large enough to hold your entire database if you could. However, > I'm now realizing that I was _also_ assuming the IPC shared memory was also > being locked via mlock to prevent swapping it out, and I'm now getting the on FreeBSD you can use kern.ipc.shm_use_phys=1 to lock shared memory pages in core (note that it's useless on 9.3 as mmap is now used to allocate shared memory) (and BTW I'm curious is someone has done benchmarks on FreeBSD + 9.3 + mmap, because enabling kern.ipc.shm_use_phys leads to a 2-4x perf improvement in some benchmarks) > feeling that this isn't true, which means the double buffering could lead > to swap space use on buffer cache pressure...which I do occasionally see in > ways I had not expected. you can't avoid double buffering sometime, for example if a block is read from disk and has not been requested previously it will first go to the OS cache and then to the buffer cache. In an ideal world block that are most frequently used should be in the database buffer cache, and others in the OS cache. > > We do streaming replication and also store them for snapshot PITR backups, > so I am intimately familiar with our write load, and I can say it is pretty > low (we ship a 16MB WAL file about every 10-15 minutes during our busiest > times). > > That said, I can see how an import that is doing a bunch of writes could > possibly spread those over a large area that would then consume a lot of > CPU on the writer and checkpoint; however, I do not see how either of those > would cause 40-60 different postgres backgroud processes (all running a > normally "light query") to spin off into oblivion unless the write work > load is somehow threaded into the background workers (which I'm sure it > isn't). So, I think we're still dealing with a spinlock issue. > > We're going to upgrade to 9.1.10 (with debug symbols) Thursday night and > add another 64GB of RAM. I'll tune shared_buffers down to 2GB at that time > and bump effective_cache_size up at the same time. My large history of sar > data will make it apparent pretty quickly if that is a win/lose/tie. > > If we have another spike in production, we'll be ready to measure it more > accurately. > > Thanks, > > Tony -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance