Re: FCOE vn2vn memory leaks in 4.14

ard <ard@xxxxxxxxx> · Fri, 27 Jul 2018 00:49:55 +0200

Hi,

On Thu, Jul 26, 2018 at 05:05:37PM +0200, Johannes Thumshirn wrote:
> On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote:
> > The system itself is an exynos 5422 arm. It worked perfectly fine
> > with 3.10 as an Initiator, now it leaks memory the moment I
> > enable the FCoE vlan on the port.
> 
> So I had a look through the commits between v3.10 and v4.14 and this
> one sticks out:
> ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()")
> 
> While I think it is necessary to release a skb with kfree_skb() it
> still might be worth trying to revert it for a test run.

So I had a recompile for the destop (i920)
And fortunately after 2 hours he was already collecting memory
leaks.
This makes to me at least a few unknowns more clean:
1) usb vs pci nic doesn't matter.
(I am too lazy to send in:
https://github.com/ardje/linux/commit/93e0b1fec38859ff0fb6e24eab10778f5b3be289
)
2) ARM vs X86 doesn't matter

Anyway: here are the kmemleak and the dmesg after almost 2 hours:
https://github.com/hardkernel/linux/files/2233646/kmemleak-antec.txt
https://github.com/hardkernel/linux/files/2233648/dmesg.txt

Also the kmemleak.txt of the x86 seems to be more verbose:

unreferenced object 0xffff880196472400 (size 512):
  comm "kworker/7:2", pid 120, jiffies 4301444306 (age 1225.078s)
  hex dump (first 32 bytes):
    b8 d7 7c 8d 01 88 ff ff 00 00 00 00 00 00 00 00  ..|.............
    05 00 00 00 08 00 00 00 52 05 30 06 1e 00 00 10  ........R.0.....
  backtrace:
    [<ffffffffa0344c62>] fc_rport_create+0x42/0x190 [libfc]
    [<ffffffffa037b412>] fcoe_ctlr_vn_add.isra.17+0x42/0x1d0 [libfcoe]
    [<ffffffffa037df96>] fcoe_ctlr_vn_recv+0x496/0xad0 [libfcoe]
    [<ffffffffa037ecd0>] fcoe_ctlr_recv_work+0x700/0xfb0 [libfcoe]
    [<ffffffff810c7622>] process_one_work+0x142/0x370
    [<ffffffff810c7b82>] worker_thread+0x62/0x3d0
    [<ffffffff810cc4d4>] kthread+0x114/0x150
    [<ffffffff81a001e5>] ret_from_fork+0x35/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

vs:
unreferenced object 0xe07d9b00 (size 256):
  comm "kworker/0:1", pid 97, jiffies 4294944354 (age 209914.188s)
  hex dump (first 32 bytes):
    70 64 49 ec 00 00 00 00 07 00 00 00 08 00 00 00  pdI.............
    88 40 7f 1d 24 00 00 10 88 40 7f 1d 24 00 00 20  .@..$....@..$.. 
  backtrace:
    [<bf3382ec>] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe]
    [<bf338c64>] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe]
    [<bf33a400>] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe]
    [<c013dbb0>] process_one_work+0x138/0x4bc
    [<c013df68>] worker_thread+0x34/0x4f4
    [<c01434a8>] kthread+0x12c/0x15c
    [<c0108928>] ret_from_fork+0x14/0x2c
    [<ffffffff>] 0xffffffff

Now the x86 dump leads me to:
http://lists.open-fcoe.org/pipermail/fcoe-devel/2013-May/012014.html

Actually already got there from my arm dump, but they are different in backtrace.
Anyway:
root@antec:~# grep -c fc_rport_create kmemleak.txt
44
So 44 * 512 bytes leaked in that path. And an extra thing: "it was leaked in" libfc and not libfcoe.
Or just like the bug report we were leaking fc_rport_priv.
But one thing I don't understand (yet) is why the fc_rport_create happens while
we already have a port.

Anyway, I will continue bug hunting. It's night, and the temperature has dropped to 29.8 .

Regards,
Ard

-- 
.signature not found