hi Merlin:
I'm sorry to reply so late. I don't know exactly the resault :
> number of active queries with average duration:
> SELECT query, count(*), avg(now() - query_start) FROM
> pg_stat_activity where state != 'idle' GROUP BY 1;
there are avg 100 active connect during load even.every query cost 2000 ms+.
how can I Capture backtrace during load even?
[root@pg1 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never[root@pg1 ~]# cat /sys/kernel/mm/transparent_hugepage/defrag
[always] madvise never
657985552@xxxxxx
From: Merlin MoncureDate: 2015-11-10 23:55To: 莎士比亚说:CC: Bill Moran; pgsql-generalSubject: Re: ??: postgres cpu 100% need helpOn Tue, Nov 10, 2015 at 8:26 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:> On Sun, Nov 8, 2015 at 7:22 AM, 莎士比亚说: <657985552@xxxxxx> wrote:>> Hi moran and others;>>>> yesterday i get the pg problem again, and i use perf top Observation>> follows:>> PerfTop: 11574 irqs/sec kernel: 2.2% exact: 0.0% [4000Hz cycles],>> (all, 32 CPUs)>> 81.39% postgres [.] s_lock>> 5.42% postgres [.] LWLockAcquire>> 4.59% postgres [.] LWLockRelease>> 3.06% postgres [.] TransactionIdIsInProgress>> 0.38% postgres [.] PinBuffer>> 0.31% postgres [.] TransactionIdPrecedes>> 0.27% postgres [.] UnpinBuffer>> 0.19% postgres [.] TransactionIdIsCurrentTransactionId>> 0.16% postgres [.] heap_hot_search_buffer>> 0.15% [kernel] [k] number.isra.1>> 0.14% [kernel] [k] kallsyms_expand_symbol.constprop.1>> 0.10% [kernel] [k] module_get_kallsym>> 0.10% libc-2.17.so [.] __strcmp_sse42>> 0.09% [kernel] [k] _raw_spin_lock>> 0.09% postgres [.] hash_search_with_hash_value>>>> is spin lock problem ? I need everyone's help to solve the problem.thsnks!>> Yup, spinlock problem. These can be difficult. What weneed now is> some metrics. They must be captured during load event:>> *) number of active queries with average duration:> SELECT query, count(*), avg(now() - query_start) FROM> pg_stat_activity where state != 'idle' GROUP BY 1;>> *) context switches via "vmstat 1" get a snapshot during load and> during non load time for comparison>> Solution to this problem will probably be one or more of:> 1) Significantly downgrade shared_buffers (say, to 4GB)>> 2) Upgrade database to 9.4 and hope for the best>> 3) Capture backtrace during load event to determine exactly which path> is going into spinlock>> 4) Install pgbouncer or another connection pooler to limit active> queries on database>> 5) Install experimental patches to custom compiled database to test> and verify a hypothetical fix>> Out of those 5 things, which are possible for you to do? Best case> scenario is that you have non-essential server that reproduces the> issue.Can we also see output of:cat /sys/kernel/mm/redhat_transparent_hugepage/enabledcat /sys/kernel/mm/redhat_transparent_hugepage/defragmerlin