Re: Increase in max_connections

"Anand Kumar, Karthik" <Karthik.AnandKumar@xxxxxxxxxxxxxx> · Wed, 26 Mar 2014 20:22:01 +0000

Hi all,

We finally made some headway on this - we noticed messages like the below
in /var/log/messages whenever the issue happened:

Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure.
order:1, mode:0x20
Mar 26 07:39:58 site-db01b kernel: Pid: 39066, comm: postmaster Not
tainted 2.6.32-279.el6.x86_64 #1
Mar 26 07:39:58 site-db01b kernel: Call Trace:
Mar 26 07:39:58 site-db01b kernel: <IRQ>  [<ffffffff8112759f>] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8116297a>] ?
fallback_alloc+0x1ba/0x270
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81161d62>] ?
kmem_getpages+0x62/0x170
Mar 26 07:39:58 site-db01b kernel: [<ffffffff811623cf>] ?
cache_grow+0x2cf/0x320
Mar 26 07:39:58 site-db01b kernel: [<ffffffff811626f9>] ?
____cache_alloc_node+0x99/0x160
Mar 26 07:39:58 site-db01b kernel: [<ffffffff811634db>] ?
kmem_cache_alloc+0x11b/0x190
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8142df32>] ?
sk_clone+0x22/0x2e0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8142dc68>] ?
sk_prot_alloc+0x48/0x1c0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81494ae3>] ?
tcp_create_openreq_child+0x23/0x450
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8147bb86>] ?
inet_csk_clone+0x16/0xd0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff814935be>] ?
tcp_v4_rcv+0x4fe/0x8d0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81471200>] ?
ip_local_deliver_finish+0x0/0x2d0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff814712dd>] ?
ip_local_deliver_finish+0xdd/0x2d0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8149239d>] ?
tcp_v4_syn_recv_sock+0x4d/0x310
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81494886>] ?
tcp_check_req+0x226/0x460
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81491dbb>] ?
tcp_v4_do_rcv+0x35b/0x430
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81489cfd>] ?
tcp_rcv_established+0x38d/0x800
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81470fb5>] ?
ip_rcv+0x275/0x350
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81470a2d>] ?
ip_rcv_finish+0x12d/0x440
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81471568>] ?
ip_local_deliver+0x98/0xa0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8143a7bb>] ?
__netif_receive_skb+0x49b/0x6f0
Mar 26 07:39:58 site-db01b kernel: [<ffffffffa02fe1b4>] ?
tg3_poll_work+0x654/0xe30 [tg3]
Mar 26 07:39:58 site-db01b kernel: [<ffffffffa02fe9dc>] ?
tg3_poll_msix+0x4c/0x150 [tg3]
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8143ca38>] ?
netif_receive_skb+0x58/0x60
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81073ec1>] ?
__do_softirq+0xc1/0x1e0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8143cb40>] ?
napi_skb_finish+0x50/0x70
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8143f193>] ?
net_rx_action+0x103/0x2f0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8143f079>] ?
napi_gro_receive+0x39/0x50
Mar 26 07:39:58 site-db01b kernel: [<ffffffff810db800>] ?
handle_IRQ_event+0x60/0x170
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81073ca5>] ?
irq_exit+0x85/0x90
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8100de85>] ?
do_softirq+0x65/0xa0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff81073f1f>] ?
__do_softirq+0x11f/0x1e0
Mar 26 07:39:58 site-db01b kernel: [<ffffffff8100c24c>] ?
call_softirq+0x1c/0x30

Doing some digging on that, we disabled TSO/TRO, GSO/GRO at the tcp layer
- and that seems to have helped.

$ sudo ethtool -k eth0 Offload parameters for
eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off

However, I'm looking for more information on what's happening: That stack
trace above seems to indicate that it was unable to allocate 2*4k pages
(8k) to the network stack. Its likely that was needed for GSO/GRO.
However, wondering why the kernel is unable to allocate just 8k - we have
a 768G RAM server, with over 54G in buffers/cache

root@site-db01b:/proc # free -m
             total       used       free     shared    buffers     cached
Mem:        775382     773354       2028          0       1403     738735
-/+ buffers/cache:      33215     742166
Swap:            0          0          0

Looking a little deeper, I saw signs of memory being heavily fragmented:

root@site-db01b:/var/log # cat /proc/buddyinfo
Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0

Node 0 has 13069 4k blocks, and zero 8k blocks available to use
Which is likely what caused the problem, I'd think.

A little while later though, buddyinfo changed and suddenly there was a
lot more memory in 8k blocks.

root@site-db01b:/proc # cat /proc/buddyinfo
Node 0, zone      DMA      1      1      2      2      2      1      0
 0      1      1      3
Node 0, zone    DMA32      8      7      8      7     10      8      7
11      9      5     92
Node 0, zone   Normal   9645   5495   1115      0      0      0      0
 0      0      0      1
Node 1, zone   Normal 409734  10953      1      0      1      1      0
 1      1      1      0

(Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)

Anyone have any idea why memory was so fragmented, and what causes memory
to be defragged? Is it something postgres does? Are there any kernel
specific settings that control it?

Thanks,
Karthik

On 3/14/14 3:37 PM, "Anand Kumar, Karthik"
<Karthik.AnandKumar@xxxxxxxxxxxxxx> wrote:

>For anyone that's still following - we tried upgrading to postgres 9.3.3 -
>that hasn't helped.
>
>Running an strace on the pid that was consuming the highest CPU at the
>time of the outage shows:
>
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91881569, {{12, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(89325587, {{14, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90964037, {{4, 1, 0}}, 1)         = 0
>semop(90308657, {{5, 1, 0}}, 1)         = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(88866821, {{12, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90439733, {{13, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90341426, {{2, 1, 0}}, 1)         = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90308657, {{5, 1, 0}}, 1)         = 0
>semop(91881569, {{12, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(88866821, {{12, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91881569, {{12, 1, 0}}, 1)        = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90865730, {{5, 1, 0}}, 1)         = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(91521110, {{12, -1, 0}}, 1)       = 0
>semop(90865730, {{5, 1, 0}}, 1)         = 0
>
>I've seen other people talk of this problem with a lot of semop calls,
>haven't yet found a clear solution.
>Anyone have any ideas?
>
>I've also downloaded the perf tool based on
>http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html - will see what
>that has to show.
>
>Thanks,
>Karthik
>
>
>On 3/11/14 1:06 PM, "John R Pierce" <pierce@xxxxxxxxxxxx> wrote:
>
>>On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote:
>>> We typically see about 500-700 active queries at a time
>>
>>if these are primarily small/fast queries, like OLTP operations, and you
>>DONT have 200-400 CPU cores on this server, you will likely find that if
>>you use a queueing mechanism to only execute about 2X your CPU core
>>count concurrently, you will get MORE total transactions/second than
>>trying to do 500-700 at once.
>>
>>if your apps are using persistent connections, then the session pooling
>>model won't do any good, you should use transaction pooling.  you want
>>the actual active query count to be tunable, probably down around 2X the
>>cpu core count, depending on various things.    some folks say, CPU
>>cores/threads plus disk spindles is the optimal number.
>>
>>
>>
>>-- 
>>john r pierce                                      37N 122W
>>somewhere on the middle of the left coast
>>
>>
>>
>>-- 
>>Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-general
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general