Re: CPU spikes and transactions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 13, 2014 at 6:04 PM, Dave Owens <dave@xxxxxxxxxxxxx> wrote:
Hi,

Apologies for resurrecting this old thread, but it seems like this is better than starting a new conversation.

We are now running 9.1.13 and have doubled the CPU and memory.  So 2x 16 Opteron 6276 (32 cores total), and 64GB memory.  shared_buffers set to 20G, effective_cache_size set to 40GB.

We were able to record perf data during the latest incident of high CPU utilization. perf report is below:

Samples: 31M of event 'cycles', Event count (approx.): 16289978380877 
 44.74%       postmaster  [kernel.kallsyms]             [k] _spin_lock_irqsave                                     
 15.03%       postmaster  postgres                      [.] 0x00000000002ea937                                     
  3.14%       postmaster  postgres                      [.] s_lock                                                 
  2.30%       postmaster  [kernel.kallsyms]             [k] compaction_alloc                                       
  2.21%       postmaster  postgres                      [.] HeapTupleSatisfiesMVCC                                 
  1.75%       postmaster  postgres                      [.] hash_search_with_hash_value                            
  1.25%       postmaster  postgres                      [.] ExecScanHashBucket                                     
  1.20%       postmaster  postgres                      [.] SHMQueueNext                                           
  1.05%       postmaster  postgres                      [.] slot_getattr                                           
  1.04%             init  [kernel.kallsyms]             [k] native_safe_halt                                       
  0.73%       postmaster  postgres                      [.] LWLockAcquire                                          
  0.59%       postmaster  [kernel.kallsyms]             [k] page_fault                                             
  0.52%       postmaster  postgres                      [.] ExecQual                                               
  0.40%       postmaster  postgres                      [.] ExecStoreTuple                                         
  0.38%       postmaster  postgres                      [.] ExecScan                                               
  0.37%       postmaster  postgres                      [.] check_stack_depth                                      
  0.35%       postmaster  postgres                      [.] SearchCatCache                                         
  0.35%       postmaster  postgres                      [.] CheckForSerializableConflictOut                        
  0.34%       postmaster  postgres                      [.] LWLockRelease                                          
  0.30%       postmaster  postgres                      [.] _bt_checkkeys                                          
  0.28%       postmaster  libc-2.12.so                  [.] memcpy                                                 
  0.27%       postmaster  [kernel.kallsyms]             [k] get_pageblock_flags_group                              
  0.27%       postmaster  postgres                      [.] int4eq                                                 
  0.27%       postmaster  postgres                      [.] heap_page_prune_opt                                    
  0.27%       postmaster  postgres                      [.] pgstat_init_function_usage                             
  0.26%       postmaster  [kernel.kallsyms]             [k] _spin_lock                                             
  0.25%       postmaster  postgres                      [.] _bt_compare                                            
  0.24%       postmaster  postgres                      [.] pgstat_end_function_usage

...please let me know if we need to produce the report differently to be useful.

We will begin reducing shared_buffers incrementally over the coming days.


This is definitely pointing at THP compaction which is increasingly emerging as a possible culprit for suddenly occurring (and just as suddenly resolving) cpu spikes.  The evidence I see is:

*) Lots of time in kernel
*) "compaction_alloc"
*) otherwise normal postgres profile (not lots of time in s_lock, LWLock, or other weird things)


Please check the value of THP (see here: http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadaoop-workloads/) and various other workloads.   If it is enabled consider disabling it...this will revert to pre linux 6 behavior.  If you are going to attack this from the point of view of lowering shared buffers, do not bother with incremental...head straight for 2GB or it's unlikely the problem will be fixed.   THP compaction is not a postgres problem...mysql is affected as is other server platforms.  If THP is indeed causing the problem, it couldn't hurt to get on the horn withe linux guys.  Last I heard they claimed this kind of thing was fixed but I don't know where things stand now.

merlin


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux