Would you be able to get a stack trace of a backend that's holding an
extension lock? Or maybe perf would provide some insight.
The outage occurred again but briefer. There were no ExclusiveLock
messages, presumably because the timeout for logging locks was not
exceeded. But all available connection slots were used and many
incoming requests were denied.
Below you'll find the "perf report" and the "zoomed in" (I suppose
that's the callstack) of the top entry.
The top entry's call stack has these 4 postmaster functions near the top:
PinBuffer
LockRelease
hash_any
HeapTupleSatisfiesMVCC
We'll be rolling back from 9.4.1 to 9.3.6 tonight, hoping that will
resolve the issue.
===== BELOW A 10 SECOND perf top CAPTURE DURING THE OUTAGE
# ========
# captured on: Mon Apr 20 20:34:43 2015
# hostname : db1a
# os release : 2.6.32-504.1.3.el6.x86_64
# perf version : 2.6.32-504.1.3.el6.x86_64.debug
# arch : x86_64
# nrcpus online : 24
# nrcpus avail : 24
# cpudesc : Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz
# cpuid : GenuineIntel,6,45,7
# total memory : 49373964 kB
# cmdline : /usr/bin/perf record -a -o 2015-04-20_20:34:28 sleep 10
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2
= 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1,
precise_ip = 0, attr_mmap2 = 0, attr_mmap = 1, attr_mmap_data = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: uncore_r3qpi_1 = 23, uncore_r3qpi_0 = 22, uncore_r2pcie
= 21, uncore_qpi_1 = 20, uncore_qpi_0 = 19, uncore_imc_3 = 18,
uncore_imc_2 = 17, uncore_imc_1 = 16, uncore_imc_0 = 15, uncore_ha = 14,
uncore_pcu = 13, uncore_cbox_5 = 12, uncore_c
# ========
#
# Samples: 960K of event 'cycles'
# Event count (approx.): 757057831613
#
# Overhead Command Shared Object
Symbol
# ........ ............... .....................
.................................................
#
59.73% postmaster [kernel.kallsyms] [k] compaction_alloc
1.31% postmaster [kernel.kallsyms] [k] _spin_lock
0.94% postmaster [kernel.kallsyms] [k]
__reset_isolation_suitable
0.78% postmaster [kernel.kallsyms] [k] compact_zone
0.67% postmaster [kernel.kallsyms] [k]
get_pageblock_flags_group
0.64% postmaster [kernel.kallsyms] [k] copy_page_c
0.48% :13410 [kernel.kallsyms] [k] compaction_alloc
0.45% :13465 [kernel.kallsyms] [k] compaction_alloc
0.45% postmaster [kernel.kallsyms] [k] clear_page_c
0.44% postmaster postgres [.]
hash_search_with_hash_value
0.41% :13324 [kernel.kallsyms] [k] compaction_alloc
0.40% :13561 [kernel.kallsyms] [k] compaction_alloc
0.38% :13374 [kernel.kallsyms] [k] compaction_alloc
0.37% :13272 [kernel.kallsyms] [k] compaction_alloc
0.37% postmaster [kernel.kallsyms] [k] unmap_vmas
0.36% postmaster [kernel.kallsyms] [k] page_fault
0.36% :13380 [kernel.kallsyms] [k] compaction_alloc
0.35% :13482 [kernel.kallsyms] [k] compaction_alloc
0.34% :13555 [kernel.kallsyms] [k] compaction_alloc
0.34% postmaster [kernel.kallsyms] [k]
set_pageblock_flags_group
0.34% postmaster [kernel.kallsyms] [k] page_check_address
0.33% :13528 [kernel.kallsyms] [k] compaction_alloc
0.33% :13464 [kernel.kallsyms] [k] compaction_alloc
0.31% :13547 [kernel.kallsyms] [k] compaction_alloc
0.30% postmaster [kernel.kallsyms] [k] _spin_lock_irqsave
0.29% :13395 [kernel.kallsyms] [k] compaction_alloc
0.29% :13546 [kernel.kallsyms] [k] compaction_alloc
0.28% postmaster [kernel.kallsyms] [k]
remove_migration_pte
0.28% :13355 [kernel.kallsyms] [k] compaction_alloc
0.28% postmaster [kernel.kallsyms] [k] list_del
0.28% :13432 [kernel.kallsyms] [k] compaction_alloc
0.27% :13258 [kernel.kallsyms] [k] compaction_alloc
0.27% :13328 [kernel.kallsyms] [k] compaction_alloc
0.26% postmaster [kernel.kallsyms] [k] __wake_up_bit
0.26% :13361 [kernel.kallsyms] [k] compaction_alloc
0.25% :13334 [kernel.kallsyms] [k] compaction_alloc
0.25% :13366 [kernel.kallsyms] [k] compaction_alloc
0.25% :13549 [kernel.kallsyms] [k] compaction_alloc
0.25% :13530 [kernel.kallsyms] [k] compaction_alloc
0.24% :13391 [kernel.kallsyms] [k] compaction_alloc
0.24% :13387 [kernel.kallsyms] [k] compaction_alloc
0.24% :13364 [kernel.kallsyms] [k] compaction_alloc
0.24% :13529 [kernel.kallsyms] [k] compaction_alloc
0.24% postmaster [kernel.kallsyms] [k] find_get_page
0.23% postmaster postgres [.] _bt_compare
0.22% :13522 [kernel.kallsyms] [k] compaction_alloc
0.22% :13579 [kernel.kallsyms] [k] compaction_alloc
0.22% :13412 [kernel.kallsyms] [k] compaction_alloc
0.21% postmaster postgres [.] AllocSetAlloc
0.21% :13407 [kernel.kallsyms] [k] compaction_alloc
0.20% :13569 [kernel.kallsyms] [k] compaction_alloc
0.20% :13322 [kernel.kallsyms] [k] compaction_alloc
0.20% :13554 [kernel.kallsyms] [k] compaction_alloc
0.20% :13327 [kernel.kallsyms] [k] compaction_alloc
===== BELOW IS THE ZOOM OF THE TOP ENTRY
Samples: 604K of event 'cycles', Event count (approx.): 476360854137,
Thread: postmaster(11565)
94.92% postmaster [kernel.kallsyms] [k] compaction_alloc
1.49% postmaster [kernel.kallsyms] [k] __reset_isolation_suitable
1.01% postmaster [kernel.kallsyms] [k] copy_page_c
0.54% postmaster [kernel.kallsyms] [k] set_pageblock_flags_group
0.41% postmaster [kernel.kallsyms] [k] __wake_up_bit
0.37% postmaster [kernel.kallsyms] [k] find_get_page
0.16% postmaster [kernel.kallsyms] [k] find_vma
0.16% postmaster libc-2.12.so [.] _int_malloc
0.16% postmaster postgres [.] PinBuffer
–’0.14% postmaster [kernel.kallsyms] [k] __alloc_pages_nodemask
–’0.11% postmaster [kernel.kallsyms] [k] filemap_fault
–’0.08% postmaster postgres [.] LockRelease
–’0.08% postmaster postgres [.] hash_any
–’0.06% postmaster libc-2.12.so [.] _int_free
–’0.06% postmaster [kernel.kallsyms] [k] page_remove_rmap
–’0.03% postmaster postgres [.] HeapTupleSatisfiesMVCC
–’0.03% postmaster [kernel.kallsyms] [k] free_pages_and_swap_cache
–’0.03% postmaster postgres [.] s_lock
–’0.02% postmaster [kernel.kallsyms] [k] unlink_anon_vmas
–’0.02% postmaster [kernel.kallsyms] [k] mem_cgroup_lru_add_list
–’0.02% postmaster postgres [.] ReindexIsProcessingIndex
–’0.01% postmaster [kernel.kallsyms] [k] set_page_dirty
–’0.01% postmaster postgres [.] eqjoinsel
–’0.01% postmaster [kernel.kallsyms] [k] perf_event_aux_ctx
–’0.01% postmaster postgres [.] hash_seq_search
–’0.01% postmaster postgres [.] get_mergejoin_opfamilies
–’0.01% postmaster [kernel.kallsyms] [k] acl_permission_check
–’0.01% postmaster libc-2.12.so [.] __sigsetjmp
–’0.01% postmaster postgres [.] GetUserId
–’0.00% postmaster postgres [.] pull_up_subqueries_recurse
–’0.00% postmaster [kernel.kallsyms] [k] do_brk
–’0.00% postmaster postgres [.] SearchSysCacheList
–’0.00% postmaster postgres [.] join_search_one_level
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general