Re: FCOE vn2vn memory leaks in 4.14

Johannes Thumshirn <jthumshirn@xxxxxxx> · Fri, 27 Jul 2018 11:26:15 +0200

On Fri, Jul 27, 2018 at 12:49:55AM +0200, ard wrote:
> Hi,
> 
> On Thu, Jul 26, 2018 at 05:05:37PM +0200, Johannes Thumshirn wrote:
> > On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> > > The system itself is an exynos 5422 arm. It worked perfectly fine
> > > with 3.10 as an Initiator, now it leaks memory the moment I
> > > enable the FCoE vlan on the port.
> > 
> > So I had a look through the commits between v3.10 and v4.14 and this
> > one sticks out:
> > ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()")
> > 
> > While I think it is necessary to release a skb with kfree_skb() it
> > still might be worth trying to revert it for a test run.
> 
> So I had a recompile for the destop (i920)
> And fortunately after 2 hours he was already collecting memory
> leaks.
> This makes to me at least a few unknowns more clean:
> 1) usb vs pci nic doesn't matter.
> (I am too lazy to send in:
> https://github.com/ardje/linux/commit/93e0b1fec38859ff0fb6e24eab10778f5b3be289
> )
> 2) ARM vs X86 doesn't matter
> 
> Anyway: here are the kmemleak and the dmesg after almost 2 hours:
> https://github.com/hardkernel/linux/files/2233646/kmemleak-antec.txt
> https://github.com/hardkernel/linux/files/2233648/dmesg.txt
> 
> Also the kmemleak.txt of the x86 seems to be more verbose:
> 
> unreferenced object 0xffff880196472400 (size 512):
>   comm "kworker/7:2", pid 120, jiffies 4301444306 (age 1225.078s)
>   hex dump (first 32 bytes):
>     b8 d7 7c 8d 01 88 ff ff 00 00 00 00 00 00 00 00  ..|.............
>     05 00 00 00 08 00 00 00 52 05 30 06 1e 00 00 10  ........R.0.....
>   backtrace:
>     [<ffffffffa0344c62>] fc_rport_create+0x42/0x190 [libfc]
>     [<ffffffffa037b412>] fcoe_ctlr_vn_add.isra.17+0x42/0x1d0 [libfcoe]
>     [<ffffffffa037df96>] fcoe_ctlr_vn_recv+0x496/0xad0 [libfcoe]
>     [<ffffffffa037ecd0>] fcoe_ctlr_recv_work+0x700/0xfb0 [libfcoe]
>     [<ffffffff810c7622>] process_one_work+0x142/0x370
>     [<ffffffff810c7b82>] worker_thread+0x62/0x3d0
>     [<ffffffff810cc4d4>] kthread+0x114/0x150
>     [<ffffffff81a001e5>] ret_from_fork+0x35/0x40
>     [<ffffffffffffffff>] 0xffffffffffffffff
> 
> vs:
> unreferenced object 0xe07d9b00 (size 256):
>   comm "kworker/0:1", pid 97, jiffies 4294944354 (age 209914.188s)
>   hex dump (first 32 bytes):
>     70 64 49 ec 00 00 00 00 07 00 00 00 08 00 00 00  pdI.............
>     88 40 7f 1d 24 00 00 10 88 40 7f 1d 24 00 00 20  .@..$....@..$.. 
>   backtrace:
>     [<bf3382ec>] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
>     [<bf338c64>] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
>     [<bf33a400>] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
>     [<c013dbb0>] process_one_work+0x138/0x4bc
>     [<c013df68>] worker_thread+0x34/0x4f4
>     [<c01434a8>] kthread+0x12c/0x15c
>     [<c0108928>] ret_from_fork+0x14/0x2c
>     [<ffffffff>] 0xffffffff
> 
> Now the x86 dump leads me to:
> http://lists.open-fcoe.org/pipermail/fcoe-devel/2013-May/012014.html
> 
> Actually already got there from my arm dump, but they are different in backtrace.
> Anyway:
> root@antec:~# grep -c fc_rport_create kmemleak.txt
> 44
> So 44 * 512 bytes leaked in that path. And an extra thing: "it was leaked in" libfc and not libfcoe.
> Or just like the bug report we were leaking fc_rport_priv.
> But one thing I don't understand (yet) is why the fc_rport_create happens while
> we already have a port.
> 
> Anyway, I will continue bug hunting. It's night, and the temperature has dropped to 29.8 .

Just to be sure, did you revert the patch I mentioned? I'm not sure it
is the one that introduced the bug, but it's definitively worth a try.

Thanks,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@xxxxxxx                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850