On Tue, May 13, 2014 at 4:04 PM, Dave Owens <dave@xxxxxxxxxxxxx> wrote:
Hi,Apologies for resurrecting this old thread, but it seems like this is better than starting a new conversation.We are now running 9.1.13 and have doubled the CPU and memory. So 2x 16 Opteron 6276 (32 cores total), and 64GB memory. shared_buffers set to 20G, effective_cache_size set to 40GB.We were able to record perf data during the latest incident of high CPU utilization. perf report is below:Samples: 31M of event 'cycles', Event count (approx.): 1628997838087744.74% postmaster [kernel.kallsyms] [k] _spin_lock_irqsave15.03% postmaster postgres [.] 0x00000000002ea9373.14% postmaster postgres [.] s_lock2.30% postmaster [kernel.kallsyms] [k] compaction_alloc2.21% postmaster postgres [.] HeapTupleSatisfiesMVCC
compaction_alloc points to "transparent huge pages" kernel problem, while HeapTupleSatisfiesMVCC points to the problem with each backend taking a ProcArrayLock for every not-yet-committed tuple it encounters. I don't know which of those leads to the _spin_lock_irqsave. It seems more likely to be transparent huge pages that does that, but perhaps both of them do.
If it is the former, you can find other message on this list about disabling it. If it is the latter, your best bet is to commit your bulk inserts as soon as possible (this might be improved for 9.5, if we can figure out how to test the alternatives). Please let us know what works.
If lowering shared_buffers works, I wonder if disabling the transparent huge page compaction issue might let you bring shared_buffers back up again.
Cheers,
Jeff