On Thu, 20 Nov 2014 22:10:02 +0000 Bond, Darryl wrote: > Brief outline: > > 6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, > Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces. > > Ceph 0.80.7-0.el7.centos from the ceph repositories. > Which kernel? Anyways, this has been discussed here very recently. And I personally ran into this about 4 years ago when I first deployed DRBD in combination with Mellanox Inifiniband HCAs. This is what makes things work for me (sysctl.d): --- # Don't swap on these boxes #vm/swappiness = 0 #vm/vfs_cache_pressure = 50 vm/min_free_kbytes = 524288 --- As you can see, I initially played with swappiness and cache pressure as well, but the real solution was and unfortunately still is to keep LOTS of memory free. Regards, Christian > > About 10 times per day, each node will oops with the following message: > > An example: > > Nov 21 07:07:50 ceph14-04 kernel: warn_alloc_failed: 366 callbacks > suppressed Nov 21 07:07:50 ceph14-04 kernel: swapper/4: page allocation > failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: > kswapd0: page allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 > ceph14-04 kernel: CPU: 5 PID: 176 Comm: kswapd0 Not tainted > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: Hardware > name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 > 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure: order:2, > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: systemd-journal: page > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 > kernel: CPU: 9 PID: 704 Comm: systemd-journal Not tainted > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: CPU: 4 > PID: 0 Comm: swapper/4 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 > 07:07:50 ceph14-04 kernel: Hardware name: Dell Inc. PowerEdge > R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: > 0000000000104020 000000005835f665 ffff88080f0a3a00 ffffffff815e239b Nov > 21 07:07:50 ceph14-04 kernel: ceph-osd: page allocation failure: > order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: Hardware name: > Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 > ceph14-04 kernel: 0000000000104020 Nov 21 07:07:50 ceph14-04 kernel: > CPU: 0 PID: 7453 Comm: ceph-osd Not tainted 3.10.0-123.9.3.el7.x86_64 #1 > Nov 21 07:07:50 ceph14-04 kernel: ffff88080f0a3a90 Nov 21 07:07:50 > ceph14-04 kernel: 0000000000104020 Nov 21 07:07:50 ceph14-04 kernel: > 000000009c9142fd Nov 21 07:07:50 ceph14-04 kernel: Hardware name: Dell > Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 > > or another example: > > Nov 20 09:03:09 ceph14-06 kernel: warn_alloc_failed: 3803 callbacks > suppressed Nov 20 09:03:09 ceph14-06 kernel: swapper/11: page allocation > failure: order:2, mode:0x104020 Nov 20 09:03:09 ceph14-06 kernel: CPU: > 11 PID: 0 Comm: swapper/11 Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov > 20 09:03:09 ceph14-06 kernel: Hardware name: Dell Inc. PowerEdge > R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 20 09:03:09 ceph14-06 kernel: > 0000000000104020 dbf4eb51672ffc35 ffff88080f163a00 ffffffff815e239b Nov > 20 09:03:09 ceph14-06 kernel: ffff88080f163a90 ffffffff81147340 > 0000000000000002 ffff88080f163a50 Nov 20 09:03:09 ceph14-06 kernel: > ffff88082ffd7e80 ffff88082ffd7e80 0000000000000002 dbf4eb51672ffc35 Nov > 20 09:03:09 ceph14-06 kernel: Call Trace: Nov 20 09:03:09 ceph14-06 > kernel: <IRQ> [<ffffffff815e239b>] dump_stack+0x19/0x1b Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff81147340>] > warn_alloc_failed+0x110/0x180 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff8114b4dc>] __alloc_pages_nodemask+0x90c/0xb10 Nov 20 09:03:09 > ceph14-06 kernel: [<ffffffff8150941d>] ? ip_rcv_finish+0x7d/0x350 Nov > 20 09:03:09 ceph14-06 kernel: [<ffffffff81509ce4>] ? ip_rcv+0x234/0x380 > Nov 20 09:03:09 ceph14-06 kernel: [<ffffffff814d01c0>] ? > netif_receive_skb+0x40/0xd0 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff81188349>] alloc_pages_current+0xa9/0x170 Nov 20 09:03:09 > ceph14-06 kernel: [<ffffffff8114629e>] __get_free_pages+0xe/0x50 Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff811930ee>] > kmalloc_order_trace+0x2e/0xa0 Nov 20 09:03:09 ceph14-06 kernel: > [<ffffffff814cfb32>] ? __netif_receive_skb_core+0x282/0x870 Nov 20 > 09:03:09 ceph14-06 kernel: [<ffffffff81194749>] __kmalloc+0x219/0x230 > Nov 20 09:03:09 ceph14-06 kernel: [<ffffffffa0145bca>] > bnx2x_frag_alloc.isra.65+0x2a/0x40 [bnx2x] Nov 20 09:03:09 ceph14-06 > kernel: [<ffffffffa01461d4>] bnx2x_alloc_rx_data.isra.72+0x54/0x1c0 > [bnx2x] Nov 20 09:03:09 ceph14-06 kernel: swapper/8: page allocation > failure: order:2, mode:0x104020 > > All oops seem to be triggered by page allocation failure. > > The effect of the oops is that the server has memory allocation errors > all over the place , but mainly in the network stack. Not surprising > since that would be the major activity. I have set vm swappiness to 0 on > one node but it still generates the errors. > > Mem: 32732696 32507888 224808 51004 0 26187580 > -/+ buffers/cache: 6320308 26412388 > Swap: 31249404 308396 30941008 > > > > Each oops is serious and affects the machine enough to trip nagios which > scans each 5 minutes. It would appear that the node doesn't respond to > the network for many seconds. > > ? > > > A couple of observations: > Affects mon/osd servers as well as just osd servers, although they don't > seem to be any more or less affected. > > The OSD processes are affected on occasions but they do not seem to be > using excessive memory PID USER PR NI VIRT RES SHR S > %CPU %MEM TIME+ COMMAND 13571 root 20 0 1847536 581636 4968 > S 2.0 1.8 211:09.58 ceph-osd 13707 root 20 0 1803560 523904 > 4956 S 2.0 1.6 184:22.69 ceph-osd 13997 root 20 0 1905820 > 580768 5088 S 1.7 1.8 182:28.36 ceph-osd 13436 root 20 0 > 1783656 544400 5076 S 1.3 1.7 216:53.34 ceph-osd 13840 root > 20 0 1778296 570400 4380 S 1.3 1.7 184:09.06 ceph-osd 14154 > root 20 0 1881804 617748 5460 S 1.3 1.9 227:42.08 ceph-osd > 14356 root 20 0 1906236 593936 4512 S 1.3 1.8 188:28.77 > ceph-osd 14491 root 20 0 1837232 546140 4264 S 1.0 1.7 > 182:27.13 ceph-osd > > The main culprit seems to be the vm page cache. > > Any recommendations? > > Regards > Darryl > > > > ________________________________ > > The contents of this electronic message and any attachments are intended > only for the addressee and may contain legally privileged, personal, > sensitive or confidential information. If you are not the intended > addressee, and have received this email, any transmission, distribution, > downloading, printing or photocopying of the contents of this message or > attachments is strictly prohibited. Any legal privilege or > confidentiality attached to this message and attachments is not waived, > lost or destroyed by reason of delivery to any person other than > intended addressee. If you have received this message and are not the > intended addressee you should notify the sender by return email and > destroy all copies of the message and any attachments. Unless expressly > attributed, the views expressed in this email do not necessarily > represent the views of the company. > _______________________________________________ ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com