Re: Kernel memory allocation oops Centos 7

Christian Balzer <chibi@xxxxxxx> · Fri, 21 Nov 2014 09:06:39 +0900

On Thu, 20 Nov 2014 22:10:02 +0000 Bond, Darryl wrote:

> Brief outline:
> 
> 6 Node production  cluster. Each node Dell R610, 8x1.4TB SAS Disks,
> Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces.
> 
> Ceph 0.80.7-0.el7.centos from the ceph repositories.
> 
Which kernel?

Anyways, this has been discussed here very recently.
And I personally ran into this about 4 years ago when I first deployed
DRBD in combination with Mellanox Inifiniband HCAs.

This is what makes things work for me (sysctl.d):
---
# Don't swap on these boxes
#vm/swappiness = 0
#vm/vfs_cache_pressure = 50
vm/min_free_kbytes = 524288
---

As you can see, I initially played with swappiness and cache pressure as
well, but the real solution was and unfortunately still is to keep LOTS of
memory free.

Regards,

Christian
> 
> About 10 times per day, each node will oops with the following message:
> 
> An example:
> 
> Nov 21 07:07:50 ceph14-04 kernel: warn_alloc_failed: 366 callbacks
> suppressed Nov 21 07:07:50 ceph14-04 kernel: swapper/4: page allocation
> failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel:
> kswapd0: page allocation failure: order:2, mode:0x104020 Nov 21 07:07:50
> ceph14-04 kernel: CPU: 5 PID: 176 Comm: kswapd0 Not tainted
> 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: Hardware
> name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21
> 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure: order:2,
> mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: systemd-journal: page
> allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04
> kernel: CPU: 9 PID: 704 Comm: systemd-journal Not tainted
> 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: CPU: 4
> PID: 0 Comm: swapper/4 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21
> 07:07:50 ceph14-04 kernel: Hardware name: Dell Inc. PowerEdge
> R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel:
> 0000000000104020 000000005835f665 ffff88080f0a3a00 ffffffff815e239b Nov
> 21 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure:
> order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: Hardware name:
> Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50
> ceph14-04 kernel:  0000000000104020 Nov 21 07:07:50 ceph14-04 kernel:
> CPU: 0 PID: 7453 Comm: ceph-osd Not tainted 3.10.0-123.9.3.el7.x86_64 #1
> Nov 21 07:07:50 ceph14-04 kernel:  ffff88080f0a3a90 Nov 21 07:07:50
> ceph14-04 kernel:  0000000000104020 Nov 21 07:07:50 ceph14-04 kernel:
> 000000009c9142fd Nov 21 07:07:50 ceph14-04 kernel: Hardware name: Dell
> Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014
> 
> or another example:
> 
> Nov 20 09:03:09 ceph14-06 kernel: warn_alloc_failed: 3803 callbacks
> suppressed Nov 20 09:03:09 ceph14-06 kernel: swapper/11: page allocation
> failure: order:2, mode:0x104020 Nov 20 09:03:09 ceph14-06 kernel: CPU:
> 11 PID: 0 Comm: swapper/11 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov
> 20 09:03:09 ceph14-06 kernel: Hardware name: Dell Inc. PowerEdge
> R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 20 09:03:09 ceph14-06 kernel:
> 0000000000104020 dbf4eb51672ffc35 ffff88080f163a00 ffffffff815e239b Nov
> 20 09:03:09 ceph14-06 kernel:  ffff88080f163a90 ffffffff81147340
> 0000000000000002 ffff88080f163a50 Nov 20 09:03:09 ceph14-06 kernel:
> ffff88082ffd7e80 ffff88082ffd7e80 0000000000000002 dbf4eb51672ffc35 Nov
> 20 09:03:09 ceph14-06 kernel: Call Trace: Nov 20 09:03:09 ceph14-06
> kernel:  <IRQ>  [<ffffffff815e239b>] dump_stack+0x19/0x1b Nov 20
> 09:03:09 ceph14-06 kernel:  [<ffffffff81147340>]
> warn_alloc_failed+0x110/0x180 Nov 20 09:03:09 ceph14-06 kernel:
> [<ffffffff8114b4dc>] __alloc_pages_nodemask+0x90c/0xb10 Nov 20 09:03:09
> ceph14-06 kernel:  [<ffffffff8150941d>] ? ip_rcv_finish+0x7d/0x350 Nov
> 20 09:03:09 ceph14-06 kernel:  [<ffffffff81509ce4>] ? ip_rcv+0x234/0x380
> Nov 20 09:03:09 ceph14-06 kernel:  [<ffffffff814d01c0>] ?
> netif_receive_skb+0x40/0xd0 Nov 20 09:03:09 ceph14-06 kernel:
> [<ffffffff81188349>] alloc_pages_current+0xa9/0x170 Nov 20 09:03:09
> ceph14-06 kernel:  [<ffffffff8114629e>] __get_free_pages+0xe/0x50 Nov 20
> 09:03:09 ceph14-06 kernel:  [<ffffffff811930ee>]
> kmalloc_order_trace+0x2e/0xa0 Nov 20 09:03:09 ceph14-06 kernel:
> [<ffffffff814cfb32>] ? __netif_receive_skb_core+0x282/0x870 Nov 20
> 09:03:09 ceph14-06 kernel:  [<ffffffff81194749>] __kmalloc+0x219/0x230
> Nov 20 09:03:09 ceph14-06 kernel:  [<ffffffffa0145bca>]
> bnx2x_frag_alloc.isra.65+0x2a/0x40 [bnx2x] Nov 20 09:03:09 ceph14-06
> kernel:  [<ffffffffa01461d4>] bnx2x_alloc_rx_data.isra.72+0x54/0x1c0
> [bnx2x] Nov 20 09:03:09 ceph14-06 kernel: swapper/8: page allocation
> failure: order:2, mode:0x104020
> 
> All oops seem to be triggered by page allocation failure.
> 
> The effect of the oops is that the server has memory allocation errors
> all over the place , but mainly in the network stack. Not surprising
> since that would be the major activity. I have set vm swappiness to 0 on
> one node but it still generates the errors.
> 
> Mem:      32732696   32507888     224808      51004          0   26187580
> -/+ buffers/cache:    6320308   26412388
> Swap:     31249404     308396   30941008
> 
> 
> 
> Each oops is serious and affects the machine enough to trip nagios which
> scans each 5 minutes. It would appear that the node doesn't respond to
> the network for many seconds.
> 
> ?
> 
> 
> A couple of observations:
> Affects mon/osd servers as well as just osd servers, although they don't
> seem to be any more or less affected.
> 
> The OSD processes are affected on occasions but they do not seem to be
> using excessive memory PID USER      PR  NI    VIRT    RES    SHR S
> %CPU %MEM     TIME+ COMMAND 13571 root      20   0 1847536 581636   4968
> S   2.0  1.8 211:09.58 ceph-osd 13707 root      20   0 1803560 523904
> 4956 S   2.0  1.6 184:22.69 ceph-osd 13997 root      20   0 1905820
> 580768   5088 S   1.7  1.8 182:28.36 ceph-osd 13436 root      20   0
> 1783656 544400   5076 S   1.3  1.7 216:53.34 ceph-osd 13840 root
> 20   0 1778296 570400   4380 S   1.3  1.7 184:09.06 ceph-osd 14154
> root      20   0 1881804 617748   5460 S   1.3  1.9 227:42.08 ceph-osd
> 14356 root      20   0 1906236 593936   4512 S   1.3  1.8 188:28.77
> ceph-osd 14491 root      20   0 1837232 546140   4264 S   1.0  1.7
> 182:27.13 ceph-osd
> 
> The main culprit seems to be the vm page cache.
> 
> Any recommendations?
> 
> Regards
> Darryl
> 
> 
> 
> ________________________________
> 
> The contents of this electronic message and any attachments are intended
> only for the addressee and may contain legally privileged, personal,
> sensitive or confidential information. If you are not the intended
> addressee, and have received this email, any transmission, distribution,
> downloading, printing or photocopying of the contents of this message or
> attachments is strictly prohibited. Any legal privilege or
> confidentiality attached to this message and attachments is not waived,
> lost or destroyed by reason of delivery to any person other than
> intended addressee. If you have received this message and are not the
> intended addressee you should notify the sender by return email and
> destroy all copies of the message and any attachments. Unless expressly
> attributed, the views expressed in this email do not necessarily
> represent the views of the company.
> _______________________________________________ ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com