Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is a 10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without any effect on this kernel. I booted 3.17.3-1.el7.elrepo.x86_64 on one node about 3 hrs ago and copied a lot of data onto the cluster. No sign of an oops on the upgraded node and memory is now all used and no swap used. total used free shared buffers cached Mem: 32897528 32625668 271860 9284 3928 25422672 -/+ buffers/cache: 7199068 25698460 Swap: 31249404 0 31249404 Darryl ____________________________ From: Christian Balzer <chibi@xxxxxxx> Sent: Friday, 21 November 2014 10:06 AM To: 'ceph-users' Cc: Bond, Darryl Subject: Re: Kernel memory allocation oops Centos 7 On Thu, 20 Nov 2014 22:10:02 +0000 Bond, Darryl wrote: > Brief outline: > > 6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, > Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces. > > Ceph 0.80.7-0.el7.centos from the ceph repositories. > Which kernel? Anyways, this has been discussed here very recently. And I personally ran into this about 4 years ago when I first deployed DRBD in combination with Mellanox Inifiniband HCAs. This is what makes things work for me (sysctl.d): --- # Don't swap on these boxes #vm/swappiness = 0 #vm/vfs_cache_pressure = 50 vm/min_free_kbytes = 524288 --- As you can see, I initially played with swappiness and cache pressure as well, but the real solution was and unfortunately still is to keep LOTS of memory free. Regards, Christian > > About 10 times per day, each node will oops with the following message: > > An example: > > Nov 21 07:07:50 ceph14-04 kernel: warn_alloc_failed: 366 callbacks > suppressed Nov 21 07:07:50 ceph14-04 kernel: swapper/4: page allocation > failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: > kswapd0: page allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 > ceph14-04 kernel: CPU: 5 PID: 176 Comm: kswapd0 Not tainted > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: Hardware > name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 > 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure: order:2, > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: systemd-journal: page > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 > kernel: CPU: 9 PID: 704 Comm: systemd-journal Not tainted > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: CPU: 4 > PID: 0 Comm: swapper/4 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 > 07:07:50 ceph14-04 kernel: Hardware name: Dell Inc. PowerEdge > R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: > 0000000000104020 000000005835f665 ffff88080f0a3a00 ffffffff815e239b Nov > 21 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure: > order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: Hardware name: > Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 > ceph14-04 kernel: 0000000000104020 Nov 21 07:07:50 ceph14-04 kernel: > CPU: 0 PID: 7453 Comm: ceph-osd Not tainted 3.10.0-123.9.3.el7.x86_64 #1 > Nov 21 07:07:50 ceph14-04 kernel: ffff88080f0a3a90 Nov 21 07:07:50 > ceph14-04 kernel: 0000000000104020 Nov 21 07:07:50 ceph14-04 kernel: > 000000009c9142fd Nov 21 07:07:50 ceph14-04 kernel: Hardware name: Dell > Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 > > or another example: > > Nov 20 09:03:09 ceph14-06 kernel: warn_alloc_failed: 3803 callbacks > suppressed Nov 20 09:03:09 ceph14-06 kernel: swapper/11: page allocation > failure: order:2, mode:0x104020 Nov 20 09:03:09 ceph14-06 kernel: CPU: > 11 PID: 0 Comm: swapper/11 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov > 20 09:03:09 ceph14-06 kernel: Hardware name: Dell Inc. PowerEdge > R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 20 09:03:09 ceph14-06 kernel: > 0000000000104020 dbf4eb51672ffc35 ffff88080f163a00 ffffffff815e239b Nov > 20 09:03:09 ceph14-06 kernel: ffff88080f163a90 ffffffff81147340 > 0000000000000002 ffff88080f163a50 Nov 20 09:03:09 ceph14-06 kernel: > ffff88082ffd7e80 ffff88082ffd7e80 0000000000000002 dbf4eb51672ffc35 Nov > 20 09:03:09 ceph14-06 kernel: Call Trace: Nov 20 09:03:09 ceph14-06 > kernel: <IRQ> [<ffffffff815e239b>] dump_stack+0x19/0x1b Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff81147340>] > warn_alloc_failed+0x110/0x180 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff8114b4dc>] __alloc_pages_nodemask+0x90c/0xb10 Nov 20 09:03:09 > ceph14-06 kernel: [<ffffffff8150941d>] ? ip_rcv_finish+0x7d/0x350 Nov > 20 09:03:09 ceph14-06 kernel: [<ffffffff81509ce4>] ? ip_rcv+0x234/0x380 > Nov 20 09:03:09 ceph14-06 kernel: [<ffffffff814d01c0>] ? > netif_receive_skb+0x40/0xd0 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff81188349>] alloc_pages_current+0xa9/0x170 Nov 20 09:03:09 > ceph14-06 kernel: [<ffffffff8114629e>] __get_free_pages+0xe/0x50 Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff811930ee>] > kmalloc_order_trace+0x2e/0xa0 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff814cfb32>] ? __netif_receive_skb_core+0x282/0x870 Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff81194749>] __kmalloc+0x219/0x230 > Nov 20 09:03:09 ceph14-06 kernel: [<ffffffffa0145bca>] > bnx2x_frag_alloc.isra.65+0x2a/0x40 [bnx2x] Nov 20 09:03:09 ceph14-06 > kernel: [<ffffffffa01461d4>] bnx2x_alloc_rx_data.isra.72+0x54/0x1c0 > [bnx2x] Nov 20 09:03:09 ceph14-06 kernel: swapper/8: page allocation > failure: order:2, mode:0x104020 > > All oops seem to be triggered by page allocation failure. > > The effect of the oops is that the server has memory allocation errors > all over the place , but mainly in the network stack. Not surprising > since that would be the major activity. I have set vm swappiness to 0 on > one node but it still generates the errors. > > Mem: 32732696 32507888 224808 51004 0 26187580 > -/+ buffers/cache: 6320308 26412388 > Swap: 31249404 308396 30941008 > > > > Each oops is serious and affects the machine enough to trip nagios which > scans each 5 minutes. It would appear that the node doesn't respond to > the network for many seconds. > > ? > > > A couple of observations: > Affects mon/osd servers as well as just osd servers, although they don't > seem to be any more or less affected. > > The OSD processes are affected on occasions but they do not seem to be > using excessive memory PID USER PR NI VIRT RES SHR S > %CPU %MEM TIME+ COMMAND 13571 root 20 0 1847536 581636 4968 > S 2.0 1.8 211:09.58 ceph-osd 13707 root 20 0 1803560 523904 > 4956 S 2.0 1.6 184:22.69 ceph-osd 13997 root 20 0 1905820 > 580768 5088 S 1.7 1.8 182:28.36 ceph-osd 13436 root 20 0 > 1783656 544400 5076 S 1.3 1.7 216:53.34 ceph-osd 13840 root > 20 0 1778296 570400 4380 S 1.3 1.7 184:09.06 ceph-osd 14154 > root 20 0 1881804 617748 5460 S 1.3 1.9 227:42.08 ceph-osd > 14356 root 20 0 1906236 593936 4512 S 1.3 1.8 188:28.77 > ceph-osd 14491 root 20 0 1837232 546140 4264 S 1.0 1.7 > 182:27.13 ceph-osd > > The main culprit seems to be the vm page cache. > > Any recommendations? > > Regards > Darryl > > > > ________________________________ > > The contents of this electronic message and any attachments are intended > only for the addressee and may contain legally privileged, personal, > sensitive or confidential information. If you are not the intended > addressee, and have received this email, any transmission, distribution, > downloading, printing or photocopying of the contents of this message or > attachments is strictly prohibited. Any legal privilege or > confidentiality attached to this message and attachments is not waived, > lost or destroyed by reason of delivery to any person other than > intended addressee. If you have received this message and are not the > intended addressee you should notify the sender by return email and > destroy all copies of the message and any attachments. Unless expressly > attributed, the views expressed in this email do not necessarily > represent the views of the company. > _______________________________________________ ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ ________________________________ The contents of this electronic message and any attachments are intended only for the addressee and may contain legally privileged, personal, sensitive or confidential information. If you are not the intended addressee, and have received this email, any transmission, distribution, downloading, printing or photocopying of the contents of this message or attachments is strictly prohibited. Any legal privilege or confidentiality attached to this message and attachments is not waived, lost or destroyed by reason of delivery to any person other than intended addressee. If you have received this message and are not the intended addressee you should notify the sender by return email and destroy all copies of the message and any attachments. Unless expressly attributed, the views expressed in this email do not necessarily represent the views of the company. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com