Re: Kernel memory allocation oops Centos 7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 24 Nov 2014 05:22:46 +0000 Bond, Darryl wrote:

> Or perhaps not :(
> Had occasion to remove a server and allow a subsequent re-balance of the
> cluster today. The box with 3.17.3 (and without min_free_kbytes set)
> oopsed and the others on the stock kernel with the setting did not.
> 
Well, at least it's good to know that.

Guess I'll keep cargo-culting that little setting for some time to come.

> Darryl
> 
> ________________________________________
> From: Christian Balzer <chibi@xxxxxxx>
> Sent: Friday, 21 November 2014 2:39 PM
> To: 'ceph-users'
> Cc: Bond, Darryl
> Subject: Re:  Kernel memory allocation oops Centos 7
> 
> Hello,
> 
> On Fri, 21 Nov 2014 04:31:18 +0000 Bond, Darryl wrote:
> 
> > Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is
> > a 10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without
> > any effect on this kernel.
> >
> I know, I read your original mail, that's why I suggested
> "vm/min_free_kbytes = 524288" and still do if you're stuck using some
> "older" kernel.
> 
> > I booted 3.17.3-1.el7.elrepo.x86_64 on one node about 3 hrs ago and
> > copied a lot of data onto the cluster. No sign of an oops on the
> > upgraded node and memory is now all used and no swap used.
> 
> Looks like the compaction improvements in the newer kernels work...
> 
> Christian
> 
> >total
> > used       free     shared    buffers     cached Mem:      32897528
> > 32625668     271860       9284       3928   25422672 -/+
> > buffers/cache:    7199068   25698460 Swap:     31249404          0
> > 31249404
> >
> > Darryl
> >
> > ____________________________
> > From: Christian Balzer <chibi@xxxxxxx>
> > Sent: Friday, 21 November 2014 10:06 AM
> > To: 'ceph-users'
> > Cc: Bond, Darryl
> > Subject: Re:  Kernel memory allocation oops Centos 7
> >
> > On Thu, 20 Nov 2014 22:10:02 +0000 Bond, Darryl wrote:
> >
> > > Brief outline:
> > >
> > > 6 Node production  cluster. Each node Dell R610, 8x1.4TB SAS Disks,
> > > Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces.
> > >
> > > Ceph 0.80.7-0.el7.centos from the ceph repositories.
> > >
> > Which kernel?
> >
> > Anyways, this has been discussed here very recently.
> > And I personally ran into this about 4 years ago when I first deployed
> > DRBD in combination with Mellanox Inifiniband HCAs.
> >
> > This is what makes things work for me (sysctl.d):
> > ---
> > # Don't swap on these boxes
> > #vm/swappiness = 0
> > #vm/vfs_cache_pressure = 50
> > vm/min_free_kbytes = 524288
> > ---
> >
> > As you can see, I initially played with swappiness and cache pressure
> > as well, but the real solution was and unfortunately still is to keep
> > LOTS of memory free.
> >
> > Regards,
> >
> > Christian
> > >
> > > About 10 times per day, each node will oops with the following
> > > message:
> > >
> > > An example:
> > >
> > > Nov 21 07:07:50 ceph14-04 kernel: warn_alloc_failed: 366 callbacks
> > > suppressed Nov 21 07:07:50 ceph14-04 kernel: swapper/4: page
> > > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04
> > > kernel: kswapd0: page allocation failure: order:2, mode:0x104020 Nov
> > > 21 07:07:50 ceph14-04 kernel: CPU: 5 PID: 176 Comm: kswapd0 Not
> > > tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04
> > > kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2
> > > 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: ceph-osd: page
> > > allocation failure: order:2, mode:0x104020 Nov 21 07:07:50 ceph14-04
> > > kernel: systemd-journal: page allocation failure: order:2,
> > > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: CPU: 9 PID: 704
> > > Comm: systemd-journal Not tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov
> > > 21 07:07:50 ceph14-04 kernel: CPU: 4 PID: 0 Comm: swapper/4 Not
> > > tainted 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04
> > > kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2
> > > 01/16/2014 Nov 21 07:07:50 ceph14-04 kernel: 0000000000104020
> > > 000000005835f665 ffff88080f0a3a00 ffffffff815e239b Nov 21 07:07:50
> > > ceph14-04 kernel: ceph-osd: page allocation failure: order:2,
> > > mode:0x104020 Nov 21 07:07:50 ceph14-04 kernel: Hardware name: Dell
> > > Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014 Nov 21 07:07:50
> > > ceph14-04 kernel:  0000000000104020 Nov 21 07:07:50 ceph14-04
> > > kernel: CPU: 0 PID: 7453 Comm: ceph-osd Not tainted
> > > 3.10.0-123.9.3.el7.x86_64 #1 Nov 21 07:07:50 ceph14-04 kernel:
> > > ffff88080f0a3a90 Nov 21 07:07:50 ceph14-04 kernel:  0000000000104020
> > > Nov 21 07:07:50 ceph14-04 kernel: 000000009c9142fd Nov 21 07:07:50
> > > ceph14-04 kernel: Hardware name: Dell Inc. PowerEdge R620/01W23F,
> > > BIOS 2.2.2 01/16/2014
> > >
> > > or another example:
> > >
> > > Nov 20 09:03:09 ceph14-06 kernel: warn_alloc_failed: 3803 callbacks
> > > suppressed Nov 20 09:03:09 ceph14-06 kernel: swapper/11: page
> > > allocation failure: order:2, mode:0x104020 Nov 20 09:03:09 ceph14-06
> > > kernel: CPU: 11 PID: 0 Comm: swapper/11 Not tainted
> > > 3.10.0-123.9.3.el7.x86_64 #1 Nov 20 09:03:09 ceph14-06 kernel:
> > > Hardware name: Dell Inc. PowerEdge R620/01W23F, BIOS 2.2.2 01/16/2014
> > > Nov 20 09:03:09 ceph14-06 kernel: 0000000000104020 dbf4eb51672ffc35
> > > ffff88080f163a00 ffffffff815e239b Nov 20 09:03:09 ceph14-06 kernel:
> > > ffff88080f163a90 ffffffff81147340 0000000000000002 ffff88080f163a50
> > > Nov 20 09:03:09 ceph14-06 kernel: ffff88082ffd7e80 ffff88082ffd7e80
> > > 0000000000000002 dbf4eb51672ffc35 Nov 20 09:03:09 ceph14-06 kernel:
> > > Call Trace: Nov 20 09:03:09 ceph14-06 kernel:  <IRQ>
> > > [<ffffffff815e239b>] dump_stack+0x19/0x1b Nov 20 09:03:09 ceph14-06
> > > kernel:  [<ffffffff81147340>] warn_alloc_failed+0x110/0x180 Nov 20
> > > 09:03:09 ceph14-06 kernel: [<ffffffff8114b4dc>]
> > > __alloc_pages_nodemask+0x90c/0xb10 Nov 20 09:03:09 ceph14-06 kernel:
> > > [<ffffffff8150941d>] ? ip_rcv_finish+0x7d/0x350 Nov 20 09:03:09
> > > ceph14-06 kernel:  [<ffffffff81509ce4>] ? ip_rcv+0x234/0x380 Nov 20
> > > 09:03:09 ceph14-06 kernel:  [<ffffffff814d01c0>] ?
> > > netif_receive_skb+0x40/0xd0 Nov 20 09:03:09 ceph14-06 kernel:
> > > [<ffffffff81188349>] alloc_pages_current+0xa9/0x170 Nov 20 09:03:09
> > > ceph14-06 kernel:  [<ffffffff8114629e>] __get_free_pages+0xe/0x50 Nov
> > > 20 09:03:09 ceph14-06 kernel:  [<ffffffff811930ee>]
> > > kmalloc_order_trace+0x2e/0xa0 Nov 20 09:03:09 ceph14-06 kernel:
> > > [<ffffffff814cfb32>] ? __netif_receive_skb_core+0x282/0x870 Nov 20
> > > 09:03:09 ceph14-06 kernel:  [<ffffffff81194749>]
> > > __kmalloc+0x219/0x230 Nov 20 09:03:09 ceph14-06 kernel:
> > > [<ffffffffa0145bca>] bnx2x_frag_alloc.isra.65+0x2a/0x40 [bnx2x] Nov
> > > 20 09:03:09 ceph14-06 kernel:  [<ffffffffa01461d4>]
> > > bnx2x_alloc_rx_data.isra.72+0x54/0x1c0 [bnx2x] Nov 20 09:03:09
> > > ceph14-06 kernel: swapper/8: page allocation failure: order:2,
> > > mode:0x104020
> > >
> > > All oops seem to be triggered by page allocation failure.
> > >
> > > The effect of the oops is that the server has memory allocation
> > > errors all over the place , but mainly in the network stack. Not
> > > surprising since that would be the major activity. I have set vm
> > > swappiness to 0 on one node but it still generates the errors.
> > >
> > > Mem:      32732696   32507888     224808      51004          0
> > > 26187580 -/+ buffers/cache:    6320308   26412388
> > > Swap:     31249404     308396   30941008
> > >
> > >
> > >
> > > Each oops is serious and affects the machine enough to trip nagios
> > > which scans each 5 minutes. It would appear that the node doesn't
> > > respond to the network for many seconds.
> > >
> > > ?
> > >
> > >
> > > A couple of observations:
> > > Affects mon/osd servers as well as just osd servers, although they
> > > don't seem to be any more or less affected.
> > >
> > > The OSD processes are affected on occasions but they do not seem to
> > > be using excessive memory PID USER      PR  NI    VIRT    RES    SHR
> > > S %CPU %MEM     TIME+ COMMAND 13571 root      20   0 1847536 581636
> > > 4968 S   2.0  1.8 211:09.58 ceph-osd 13707 root      20   0 1803560
> > > 523904 4956 S   2.0  1.6 184:22.69 ceph-osd 13997 root      20   0
> > > 1905820 580768   5088 S   1.7  1.8 182:28.36 ceph-osd 13436 root
> > > 20   0 1783656 544400   5076 S   1.3  1.7 216:53.34 ceph-osd 13840
> > > root 20   0 1778296 570400   4380 S   1.3  1.7 184:09.06 ceph-osd
> > > 14154 root      20   0 1881804 617748   5460 S   1.3  1.9 227:42.08
> > > ceph-osd 14356 root      20   0 1906236 593936   4512 S   1.3  1.8
> > > 188:28.77 ceph-osd 14491 root      20   0 1837232 546140   4264 S
> > > 1.0  1.7 182:27.13 ceph-osd
> > >
> > > The main culprit seems to be the vm page cache.
> > >
> > > Any recommendations?
> > >
> > > Regards
> > > Darryl
> > >
> > >
> > >
> > > ________________________________
> > >
> > > The contents of this electronic message and any attachments are
> > > intended only for the addressee and may contain legally privileged,
> > > personal, sensitive or confidential information. If you are not the
> > > intended addressee, and have received this email, any transmission,
> > > distribution, downloading, printing or photocopying of the contents
> > > of this message or attachments is strictly prohibited. Any legal
> > > privilege or confidentiality attached to this message and attachments
> > > is not waived, lost or destroyed by reason of delivery to any person
> > > other than intended addressee. If you have received this message and
> > > are not the intended addressee you should notify the sender by return
> > > email and destroy all copies of the message and any attachments.
> > > Unless expressly attributed, the views expressed in this email do not
> > > necessarily represent the views of the company.
> > > _______________________________________________ ceph-users mailing
> > > list ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
> >
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> >
> > ________________________________
> >
> > The contents of this electronic message and any attachments are
> > intended only for the addressee and may contain legally privileged,
> > personal, sensitive or confidential information. If you are not the
> > intended addressee, and have received this email, any transmission,
> > distribution, downloading, printing or photocopying of the contents of
> > this message or attachments is strictly prohibited. Any legal
> > privilege or confidentiality attached to this message and attachments
> > is not waived, lost or destroyed by reason of delivery to any person
> > other than intended addressee. If you have received this message and
> > are not the intended addressee you should notify the sender by return
> > email and destroy all copies of the message and any attachments.
> > Unless expressly attributed, the views expressed in this email do not
> > necessarily represent the views of the company.
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> 
> ________________________________
> 
> The contents of this electronic message and any attachments are intended
> only for the addressee and may contain legally privileged, personal,
> sensitive or confidential information. If you are not the intended
> addressee, and have received this email, any transmission, distribution,
> downloading, printing or photocopying of the contents of this message or
> attachments is strictly prohibited. Any legal privilege or
> confidentiality attached to this message and attachments is not waived,
> lost or destroyed by reason of delivery to any person other than
> intended addressee. If you have received this message and are not the
> intended addressee you should notify the sender by return email and
> destroy all copies of the message and any attachments. Unless expressly
> attributed, the views expressed in this email do not necessarily
> represent the views of the company.
> 


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux