Re: ??: postgres cpu 100% need help

"657985552@xxxxxx" <657985552@xxxxxx> · Mon, 23 Nov 2015 14:20:58 +0800

hi  Merlin:
 I'm sorry to reply so late. I  don't know exactly the resault :
  > number of active queries with average duration:
  > SELECT query, count(*), avg(now() - query_start)  FROM
  > pg_stat_activity where state != 'idle' GROUP BY 1;
  there are avg 100 active connect during  load even.every query cost 2000 ms+.

 how can I  Capture backtrace during load even?

[root@pg1 ~]# cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never
[root@pg1 ~]# cat /sys/kernel/mm/transparent_hugepage/defrag 
[always] madvise never

657985552@xxxxxx

From: Merlin Moncure
Date: 2015-11-10 23:55
To: 莎士比亚说：
CC: Bill Moran; pgsql-general
Subject: Re:  ??: postgres cpu 100% need help
On Tue, Nov 10, 2015 at 8:26 AM, Merlin Moncure <mmoncure@xxxxxxxxx> wrote:
> On Sun, Nov 8, 2015 at 7:22 AM, 莎士比亚说： <657985552@xxxxxx> wrote:
>> Hi moran and others;
>>
>>     yesterday i get the pg  problem  again, and i use perf top Observation
>> follows:
>> PerfTop:   11574 irqs/sec  kernel: 2.2%  exact:  0.0% [4000Hz cycles],
>> (all, 32 CPUs)
>>     81.39%  postgres       [.] s_lock
>>      5.42%  postgres       [.] LWLockAcquire
>>      4.59%  postgres       [.] LWLockRelease
>>      3.06%  postgres       [.] TransactionIdIsInProgress
>>      0.38%  postgres       [.] PinBuffer
>>      0.31%  postgres       [.] TransactionIdPrecedes
>>      0.27%  postgres       [.] UnpinBuffer
>>      0.19%  postgres       [.] TransactionIdIsCurrentTransactionId
>>      0.16%  postgres       [.] heap_hot_search_buffer
>>      0.15%  [kernel]       [k] number.isra.1
>>      0.14%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
>>      0.10%  [kernel]       [k] module_get_kallsym
>>      0.10%  libc-2.17.so   [.] __strcmp_sse42
>>      0.09%  [kernel]       [k] _raw_spin_lock
>>      0.09%  postgres       [.] hash_search_with_hash_value
>>
>> is spin lock problem ？  I need everyone's help to solve the problem.thsnks!
>
> Yup, spinlock problem.   These can be difficult.  What weneed now is
> some metrics.   They must be captured during load event:
>
> *) number of active queries with average duration:
> SELECT query, count(*), avg(now() - query_start)  FROM
> pg_stat_activity where state != 'idle' GROUP BY 1;
>
> *) context switches via "vmstat 1"  get a snapshot during load and
> during non load time for comparison
>
> Solution to this problem will probably be one or more of:
> 1) Significantly downgrade shared_buffers (say, to 4GB)
>
> 2) Upgrade database to 9.4 and hope for the best
>
> 3) Capture backtrace during load event to determine exactly which path
> is going into spinlock
>
> 4) Install pgbouncer or another connection pooler to limit active
> queries on database
>
> 5) Install experimental patches to custom compiled database to test
> and verify a hypothetical fix
>
> Out of those 5 things, which are possible for you to do?  Best case
> scenario is that you have non-essential server that reproduces the
> issue.

Can we also see output of:

cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

merlin