On Fri, Jul 27, 2018 at 12:49:55AM +0200, ard wrote: > Hi, > > On Thu, Jul 26, 2018 at 05:05:37PM +0200, Johannes Thumshirn wrote: > > On Thu, Jul 26, 2018 at 04:25:24PM +0200, ard wrote: > > > The system itself is an exynos 5422 arm. It worked perfectly fine > > > with 3.10 as an Initiator, now it leaks memory the moment I > > > enable the FCoE vlan on the port. > > > > So I had a look through the commits between v3.10 and v4.14 and this > > one sticks out: > > ea0a95d7f162 ("fcoe: Use kfree_skb() instead of kfree()") > > > > While I think it is necessary to release a skb with kfree_skb() it > > still might be worth trying to revert it for a test run. > > So I had a recompile for the destop (i920) > And fortunately after 2 hours he was already collecting memory > leaks. > This makes to me at least a few unknowns more clean: > 1) usb vs pci nic doesn't matter. > (I am too lazy to send in: > https://github.com/ardje/linux/commit/93e0b1fec38859ff0fb6e24eab10778f5b3be289 > ) > 2) ARM vs X86 doesn't matter > > Anyway: here are the kmemleak and the dmesg after almost 2 hours: > https://github.com/hardkernel/linux/files/2233646/kmemleak-antec.txt > https://github.com/hardkernel/linux/files/2233648/dmesg.txt > > Also the kmemleak.txt of the x86 seems to be more verbose: > > unreferenced object 0xffff880196472400 (size 512): > comm "kworker/7:2", pid 120, jiffies 4301444306 (age 1225.078s) > hex dump (first 32 bytes): > b8 d7 7c 8d 01 88 ff ff 00 00 00 00 00 00 00 00 ..|............. > 05 00 00 00 08 00 00 00 52 05 30 06 1e 00 00 10 ........R.0..... > backtrace: > [<ffffffffa0344c62>] fc_rport_create+0x42/0x190 [libfc] > [<ffffffffa037b412>] fcoe_ctlr_vn_add.isra.17+0x42/0x1d0 [libfcoe] > [<ffffffffa037df96>] fcoe_ctlr_vn_recv+0x496/0xad0 [libfcoe] > [<ffffffffa037ecd0>] fcoe_ctlr_recv_work+0x700/0xfb0 [libfcoe] > [<ffffffff810c7622>] process_one_work+0x142/0x370 > [<ffffffff810c7b82>] worker_thread+0x62/0x3d0 > [<ffffffff810cc4d4>] kthread+0x114/0x150 > [<ffffffff81a001e5>] ret_from_fork+0x35/0x40 > [<ffffffffffffffff>] 0xffffffffffffffff > > vs: > unreferenced object 0xe07d9b00 (size 256): > comm "kworker/0:1", pid 97, jiffies 4294944354 (age 209914.188s) > hex dump (first 32 bytes): > 70 64 49 ec 00 00 00 00 07 00 00 00 08 00 00 00 pdI............. > 88 40 7f 1d 24 00 00 10 88 40 7f 1d 24 00 00 20 .@..$....@..$.. > backtrace: > [<bf3382ec>] fcoe_ctlr_vn_add+0x3c/0x1b4 [libfcoe] > [<bf338c64>] fcoe_ctlr_vn_recv+0x800/0xb2c [libfcoe] > [<bf33a400>] fcoe_ctlr_recv_work+0xb94/0x17f0 [libfcoe] > [<c013dbb0>] process_one_work+0x138/0x4bc > [<c013df68>] worker_thread+0x34/0x4f4 > [<c01434a8>] kthread+0x12c/0x15c > [<c0108928>] ret_from_fork+0x14/0x2c > [<ffffffff>] 0xffffffff > > Now the x86 dump leads me to: > http://lists.open-fcoe.org/pipermail/fcoe-devel/2013-May/012014.html > > Actually already got there from my arm dump, but they are different in backtrace. > Anyway: > root@antec:~# grep -c fc_rport_create kmemleak.txt > 44 > So 44 * 512 bytes leaked in that path. And an extra thing: "it was leaked in" libfc and not libfcoe. > Or just like the bug report we were leaking fc_rport_priv. > But one thing I don't understand (yet) is why the fc_rport_create happens while > we already have a port. > > Anyway, I will continue bug hunting. It's night, and the temperature has dropped to 29.8 . Just to be sure, did you revert the patch I mentioned? I'm not sure it is the one that introduced the bug, but it's definitively worth a try. Thanks, Johannes -- Johannes Thumshirn Storage jthumshirn@xxxxxxx +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850