53.28% postmaster postgres [.] s_lock
6.22% postmaster postgres [.] 0x00000000001b4306
2.30% postmaster [kernel.kallsyms] [k] _spin_lock
2.23% postmaster postgres [.] LWLockAcquire
1.79% postmaster postgres [.] LWLockRelease
1.39% postmaster postgres [.] hash_search_with_hash_value
1.20% postmaster postgres [.] SearchCatCache
0.71% postmaster [kernel.kallsyms] [k] hrtimer_interrupt
0.56% postmaster [kernel.kallsyms] [k] tick_program_event
0.54% postmaster [kernel.kallsyms] [k] schedule
0.44% postmaster [kernel.kallsyms] [k] _spin_lock_irq
### Then zooming on s_lock
# Annotate s_lock
...
99.04 │ test %al,%al
...
# Zoom into postmaster(81487) thread
55.84% postmaster postgres [.] s_lock ◆
6.52% postmaster postgres [.] 0x00000000001b4306 ▒
2.42% postmaster [kernel.kallsyms] [k] _spin_lock ▒
2.34% postmaster postgres [.] LWLockAcquire ▒
1.87% postmaster postgres [.] LWLockRelease ▒
1.46% postmaster postgres [.] hash_search_with_hash_value ▒
1.26% postmaster postgres [.] SearchCatCache ▒
0.75% postmaster [kernel.kallsyms] [k] hrtimer_interrupt ▒
0.59% postmaster [kernel.kallsyms] [k] tick_program_event ▒
0.57% postmaster [kernel.kallsyms] [k] schedule
# Zoom into postgres DSO
65.75% postmaster [.] s_lock ◆
7.68% postmaster [.] 0x00000000001b4306 ▒
2.75% postmaster [.] LWLockAcquire ▒
2.20% postmaster [.] LWLockRelease ▒
1.72% postmaster [.] hash_search_with_hash_value ▒
1.49% postmaster [.] SearchCatCache ▒
0.54% postmaster [.] _bt_compare ▒
0.51% postmaster [.] _bt_checkkeys
On Thu, Jun 19, 2014 at 5:12 PM, Borislav Ivanov <bivanov@xxxxxxxxxxxxx> wrote:This is entirely correct. pgbouncer does not preventing database load
> However, most people on our team think that the number of connections is
> purely a symptom of the actual problem. We would love to be wrong about
> this. But for now we feel the high number of connections contributes for
> preserving the problem but it's not actually triggering the problem.
but about limiting damage when it occurs. This generally necessary in
environments where application servers keep piling on connections when
the database is not clearing queries fast enough.
In your case user% is dominating system load. Along with the high cs
this is really suggesting spinlock contention. A 'perf top' is
essential for identifying the culprit. It's very possible that 9.4
will fix your problem...see:
http://postgresql.1045698.n5.nabble.com/Cpu-usage-100-on-slave-s-lock-problem-td5768655.html.
There was some poorly optimized code in the wal replay.
merlin