Re: FCOE vn2vn memory leaks in 4.14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Jul 26, 2018 at 03:36:22PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 03:02:14PM +0200, ard wrote:
> > Hi Guys,
> > 
> > I sent this to fcoe-devel but it might be holiday season or the
> > mailing list is abandoned as the emails concerning fcoe are
> > pretty low.
> 
> Yes, the list is defunct as I didn't get admin privileges passed by
> the old Maintainer when I took over.

That explains :-).

> Anyways, can you please enable the kernel memory leak detector [1] and
> possibly even try a more up to date (like v4.18-rc6) kernel?
> 
> [1] https://www.kernel.org/doc/html/v4.17/dev-tools/kmemleak.html

The up to date kernel would be a problem.
The kmemleak log is here:
https://github.com/hardkernel/linux/files/2218589/kmemleak.txt
Sorry that github doesn't do a preview.

The system itself is an exynos 5422 arm. It worked perfectly fine
with 3.10 as an Initiator, now it leaks memory the moment I
enable the FCoE vlan on the port.

<offtopic>
I also have a arm v5 running 3.7.1 (intel ss4000e) that works
fine as stable target.

The arm as initiator was able to crash my D525 as target running
4.0 on the target just by mounting btrfs. The target now runs 4.3
and has been a stable target ever since.
</offtopic>

The main issue seems to be in fcoe_ctlr.c, and that has not
really been touched except by a broomstick for generic kernel
maintenance.

What I can do is compile a 4.14 and a 4.18 kernel for my main
initiator, a desktop that has an ssd used as bcache on FCoE
drives. That desktop is turned off however due to a heatwave.
The last known working kernel was 3.18 on that system. I will
compile a new one.

> Thanks a lot,
>        Johannes

Well, thank you for maintaining a life saver.

> 
> > 
> > On Mon, Jul 23, 2018 at 02:16:31PM +0200, ard wrote:
> > Date: Mon, 23 Jul 2018 14:16:31 +0200
> > From: ard <ard@xxxxxxxxx>
> > Subject: FCOE vn2vn memory leaks in 4.14
> > To: fcoe-devel@xxxxxxxxxxxxx
> > 
> > Hi guys,
> > 
> > After an upgrade of one of my systems from 3.10 to 4.14.55, I
> > noticed a serious memory leak.
> > As this kernel is not 100% vanilla, I started the bug report
> > here:
> > https://github.com/hardkernel/linux/issues/360
> > 
> > The essence is this:
> > I have an FCoE interface assigned to a vlan on a nic.
> > These were remnants of a test I did. The FCoE was still
> > configured, but no targets were exported to that endpoint.
> > So it would see and join multicast announcements of 2 other
> > systems, but do nothing with it.
> > This was good enoug to waste about 600MB of memory in 2 or 3
> > days.
> > Some things have changed, maybe the amount of announcements (due
> > to the heat I turn of systems), or really something in the
> > kernel. But after 1 week I really have to pro-actively reboot the
> > systeme in order to avoid OOM's.
> > I've now disabled the the FCoE vlan on the port of that system,
> > so it won't get any broadcasts.
> > No memory leaks so far.
> > The kmemleak is in that bug report, I won't mail it, since its
> > 2.5MB.
> > The gist seems to be:
> >   backtrace:
> >     [<bf3382ec>] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
> >     [<bf338c64>] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
> >     [<bf33a400>] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
> >     [<c013dbb0>] process_one_work+0x138/0x4bc
> > 
> > These seem to stand out:
> > root@odroid5:~# grep -c fcoe_ctlr_vn_add kmemleak.txt;grep -c fcoe_fip_vlan_recv kmemleak.txt 
> > 1090
> > 898
> > 
> > So there are 2 leaks: network skb leaks I presume and fcoe structure leaks.
> > Except for one system that I turn off and on once a day, all other systems are
> > stable running (older kernel though).
> > 
> > The system I turnn of and on again also has some vn2vn problems and that's also
> > a 4.14 kernel.
> > (steam machine with steamos kernel, fcoe not actively used, but with a bcache
> > on one of the targets, it probably auto registers a dependency)
> > This is outside the scope of this ticket though.
> > 
> > The system with the memory leak is a system intended to run 24/7.
> > 
> > If anyone can point me to the right place, or help me...
> > 
> > Regards,
> > Ard van Breemen
> > 
> > -- 
> > .signature not found
> 
> -- 
> Johannes Thumshirn                                          Storage
> jthumshirn@xxxxxxx                                +49 911 74053 689
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
> 

-- 
.signature not found



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux