On Mon, 24 Nov 2014 05:22:46 +0000 Bond, Darryl wrote: > Or perhaps not :( > Had occasion to remove a server and allow a subsequent re-balance of the > cluster today. The box with 3.17.3 (and without min_free_kbytes set) > oopsed and the others on the stock kernel with the setting did not. > Well, at least it's good to know that. Guess I'll keep cargo-culting that little setting for some time to come. > Darryl > > ________________________________________ > From: Christian Balzer <chibi@xxxxxxx> > Sent: Friday, 21 November 2014 2:39 PM > To: 'ceph-users' > Cc: Bond, Darryl > Subject: Re: Kernel memory allocation oops Centos 7 > > Hello, > > On Fri, 21 Nov 2014 04:31:18 +0000 Bond, Darryl wrote: > > > Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is > > a 10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without > > any effect on this kernel. > > > I know, I read your original mail, that's why I suggested > "vm/min_free_kbytes = 524288" and still do if you're stuck using some > "older" kernel. > > > I booted 3.17.3-1.el7.elrepo.x86_64 on one node about 3 hrs ago and > > copied a lot of data onto the cluster. No sign of an oops on the > > upgraded node and memory is now all used and no swap used. > > Looks like the compaction improvements in the newer kernels work... > > Christian > > >total > > used free shared buffers cached Mem: 32897528 > > 32625668 271860 9284 3928 25422672 -/+ > > buffers/cache: 7199068 25698460 Swap: 31249404 0 > > 31249404 > > > > Darryl > > > > ____________________________ > > From: Christian Balzer <chibi@xxxxxxx> > > Sent: Friday, 21 November 2014 10:06 AM > > To: 'ceph-users' > > Cc: Bond, Darryl > > Subject: Re: Kernel memory allocation oops Centos 7 > > > > On Thu, 20 Nov 2014 22:10:02 +0000 Bond, Darryl wrote: > > > > > Brief outline: > > > > > > 6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, > > > Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces. > > > > > > Ceph 0.80.7-0.el7.centos from the ceph repositories. > > > > > Which kernel? > > > > Anyways, this has been discussed here very recently. > > And I personally ran into this about 4 years ago when I first deployed > > DRBD in combination with Mellanox Inifiniband HCAs. > > > > This is what makes things work for me (sysctl.d): > > --- > > # Don't swap on these boxes > > #vm/swappiness = 0 > > #vm/vfs_cache_pressure = 50 > > vm/min_free_kbytes = 524288 > > --- > > > > As you can see, I initially played with swappiness and cache pressure > > as well, but the real solution was and unfortunately still is to keep > > LOTS of memory free. > > > > Regards, > > > > Christian > > > > > > About 10 times per day, each node will oops with the following > > > message: > > > > > > An example: > > > > > > Nov 21 07:07:50 ceph14-04 kernel: warn_alloc_failed: 366 callbacks > > > suppressed Nov 21 07:07:50 ceph14-04 kernel: swapper/4: page > > > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 > > > kernel: kswapd0: page allocation failure: order:2, mode:0x104020 Nov > > > 21 07:07:50 ceph14-04 kernel: CPU: 5 PID: 176 Comm: kswapd0 Not > > > tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 > > > kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 > > > 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: ceph-osd: page > > > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04 > > > kernel: systemd-journal: page allocation failure: order:2, > > > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: CPU: 9 PID: 704 > > > Comm: systemd-journal Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov > > > 21 07:07:50 ceph14-04 kernel: CPU: 4 PID: 0 Comm: swapper/4 Not > > > tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 > > > kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 > > > 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: 0000000000104020 > > > 000000005835f665 ffff88080f0a3a00 ffffffff815e239b Nov 21 07:07:50 > > > ceph14-04 kernel: ceph-osd: page allocation failure: order:2, > > > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: Hardware name: Dell > > > Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50 > > > ceph14-04 kernel: 0000000000104020 Nov 21 07:07:50 ceph14-04 > > > kernel: CPU: 0 PID: 7453 Comm: ceph-osd Not tainted > > > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel: > > > ffff88080f0a3a90 Nov 21 07:07:50 ceph14-04 kernel: 0000000000104020 > > > Nov 21 07:07:50 ceph14-04 kernel: 000000009c9142fd Nov 21 07:07:50 > > > ceph14-04 kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, > > > BIOS 2.2.2 01/16/2014 > > > > > > or another example: > > > > > > Nov 20 09:03:09 ceph14-06 kernel: warn_alloc_failed: 3803 callbacks > > > suppressed Nov 20 09:03:09 ceph14-06 kernel: swapper/11: page > > > allocation failure: order:2, mode:0x104020 Nov 20 09:03:09 ceph14-06 > > > kernel: CPU: 11 PID: 0 Comm: swapper/11 Not tainted > > > 3.10.0-123.9.3.el7.x86_64 #1 Nov 20 09:03:09 ceph14-06 kernel: > > > Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 > > > Nov 20 09:03:09 ceph14-06 kernel: 0000000000104020 dbf4eb51672ffc35 > > > ffff88080f163a00 ffffffff815e239b Nov 20 09:03:09 ceph14-06 kernel: > > > ffff88080f163a90 ffffffff81147340 0000000000000002 ffff88080f163a50 > > > Nov 20 09:03:09 ceph14-06 kernel: ffff88082ffd7e80 ffff88082ffd7e80 > > > 0000000000000002 dbf4eb51672ffc35 Nov 20 09:03:09 ceph14-06 kernel: > > > Call Trace: Nov 20 09:03:09 ceph14-06 kernel: <IRQ> > > > [<ffffffff815e239b>] dump_stack+0x19/0x1b Nov 20 09:03:09 ceph14-06 > > > kernel: [<ffffffff81147340>] warn_alloc_failed+0x110/0x180 Nov 20 > > > 09:03:09 ceph14-06 kernel: [<ffffffff8114b4dc>] > > > __alloc_pages_nodemask+0x90c/0xb10 Nov 20 09:03:09 ceph14-06 kernel: > > > [<ffffffff8150941d>] ? ip_rcv_finish+0x7d/0x350 Nov 20 09:03:09 > > > ceph14-06 kernel: [<ffffffff81509ce4>] ? ip_rcv+0x234/0x380 Nov 20 > > > 09:03:09 ceph14-06 kernel: [<ffffffff814d01c0>] ? > > > netif_receive_skb+0x40/0xd0 Nov 20 09:03:09 ceph14-06 kernel: > > > [<ffffffff81188349>] alloc_pages_current+0xa9/0x170 Nov 20 09:03:09 > > > ceph14-06 kernel: [<ffffffff8114629e>] __get_free_pages+0xe/0x50 Nov > > > 20 09:03:09 ceph14-06 kernel: [<ffffffff811930ee>] > > > kmalloc_order_trace+0x2e/0xa0 Nov 20 09:03:09 ceph14-06 kernel: > > > [<ffffffff814cfb32>] ? __netif_receive_skb_core+0x282/0x870 Nov 20 > > > 09:03:09 ceph14-06 kernel: [<ffffffff81194749>] > > > __kmalloc+0x219/0x230 Nov 20 09:03:09 ceph14-06 kernel: > > > [<ffffffffa0145bca>] bnx2x_frag_alloc.isra.65+0x2a/0x40 [bnx2x] Nov > > > 20 09:03:09 ceph14-06 kernel: [<ffffffffa01461d4>] > > > bnx2x_alloc_rx_data.isra.72+0x54/0x1c0 [bnx2x] Nov 20 09:03:09 > > > ceph14-06 kernel: swapper/8: page allocation failure: order:2, > > > mode:0x104020 > > > > > > All oops seem to be triggered by page allocation failure. > > > > > > The effect of the oops is that the server has memory allocation > > > errors all over the place , but mainly in the network stack. Not > > > surprising since that would be the major activity. I have set vm > > > swappiness to 0 on one node but it still generates the errors. > > > > > > Mem: 32732696 32507888 224808 51004 0 > > > 26187580 -/+ buffers/cache: 6320308 26412388 > > > Swap: 31249404 308396 30941008 > > > > > > > > > > > > Each oops is serious and affects the machine enough to trip nagios > > > which scans each 5 minutes. It would appear that the node doesn't > > > respond to the network for many seconds. > > > > > > ? > > > > > > > > > A couple of observations: > > > Affects mon/osd servers as well as just osd servers, although they > > > don't seem to be any more or less affected. > > > > > > The OSD processes are affected on occasions but they do not seem to > > > be using excessive memory PID USER PR NI VIRT RES SHR > > > S %CPU %MEM TIME+ COMMAND 13571 root 20 0 1847536 581636 > > > 4968 S 2.0 1.8 211:09.58 ceph-osd 13707 root 20 0 1803560 > > > 523904 4956 S 2.0 1.6 184:22.69 ceph-osd 13997 root 20 0 > > > 1905820 580768 5088 S 1.7 1.8 182:28.36 ceph-osd 13436 root > > > 20 0 1783656 544400 5076 S 1.3 1.7 216:53.34 ceph-osd 13840 > > > root 20 0 1778296 570400 4380 S 1.3 1.7 184:09.06 ceph-osd > > > 14154 root 20 0 1881804 617748 5460 S 1.3 1.9 227:42.08 > > > ceph-osd 14356 root 20 0 1906236 593936 4512 S 1.3 1.8 > > > 188:28.77 ceph-osd 14491 root 20 0 1837232 546140 4264 S > > > 1.0 1.7 182:27.13 ceph-osd > > > > > > The main culprit seems to be the vm page cache. > > > > > > Any recommendations? > > > > > > Regards > > > Darryl > > > > > > > > > > > > ________________________________ > > > > > > The contents of this electronic message and any attachments are > > > intended only for the addressee and may contain legally privileged, > > > personal, sensitive or confidential information. If you are not the > > > intended addressee, and have received this email, any transmission, > > > distribution, downloading, printing or photocopying of the contents > > > of this message or attachments is strictly prohibited. Any legal > > > privilege or confidentiality attached to this message and attachments > > > is not waived, lost or destroyed by reason of delivery to any person > > > other than intended addressee. If you have received this message and > > > are not the intended addressee you should notify the sender by return > > > email and destroy all copies of the message and any attachments. > > > Unless expressly attributed, the views expressed in this email do not > > > necessarily represent the views of the company. > > > _______________________________________________ ceph-users mailing > > > list ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > > http://www.gol.com/ > > > > ________________________________ > > > > The contents of this electronic message and any attachments are > > intended only for the addressee and may contain legally privileged, > > personal, sensitive or confidential information. If you are not the > > intended addressee, and have received this email, any transmission, > > distribution, downloading, printing or photocopying of the contents of > > this message or attachments is strictly prohibited. Any legal > > privilege or confidentiality attached to this message and attachments > > is not waived, lost or destroyed by reason of delivery to any person > > other than intended addressee. If you have received this message and > > are not the intended addressee you should notify the sender by return > > email and destroy all copies of the message and any attachments. > > Unless expressly attributed, the views expressed in this email do not > > necessarily represent the views of the company. > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > http://www.gol.com/ > > ________________________________ > > The contents of this electronic message and any attachments are intended > only for the addressee and may contain legally privileged, personal, > sensitive or confidential information. If you are not the intended > addressee, and have received this email, any transmission, distribution, > downloading, printing or photocopying of the contents of this message or > attachments is strictly prohibited. Any legal privilege or > confidentiality attached to this message and attachments is not waived, > lost or destroyed by reason of delivery to any person other than > intended addressee. If you have received this message and are not the > intended addressee you should notify the sender by return email and > destroy all copies of the message and any attachments. Unless expressly > attributed, the views expressed in this email do not necessarily > represent the views of the company. > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com