On Sun, Apr 09, 2023 at 01:49:30PM +0200, Mirsad Goran Todorovac wrote: > Hi all, > > There appears to be a memleak triggered by the selftest drivers/net/team. Thanks for the report. Not sure it's related to team, see below. > > # cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff8c18def8ee00 (size 256): > comm "ip", pid 5727, jiffies 4294961159 (age 954.244s) > hex dump (first 32 bytes): > 00 20 09 de 18 8c ff ff 00 00 00 00 00 00 00 00 . .............. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0 > [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0 > [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0 > [<ffffffffb6dbc00b>] vlan_vid_add+0x11b/0x290 > [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880 > [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70 > [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0 > [<ffffffffb6ac7574>] dev_open+0x94/0xa0 > [<ffffffffc176515e>] 0xffffffffc176515e Don't know what this is. Might be another issue. > [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0 > [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0 > [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10 > [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70 > [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0 > [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110 > [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c18250d3700 (size 32): > comm "ip", pid 5727, jiffies 4294961159 (age 954.244s) > hex dump (first 32 bytes): > a0 ee f8 de 18 8c ff ff a0 ee f8 de 18 8c ff ff ................ > 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ > backtrace: > [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0 > [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0 > [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0 > [<ffffffffb6dbc064>] vlan_vid_add+0x174/0x290 > [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880 > [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70 > [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0 > [<ffffffffb6ac7574>] dev_open+0x94/0xa0 > [<ffffffffc176515e>] 0xffffffffc176515e > [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0 > [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0 > [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10 > [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70 > [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0 > [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110 > [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c1846e16800 (size 256): > comm "ip", pid 7837, jiffies 4295135225 (age 258.160s) > hex dump (first 32 bytes): > 00 20 f7 de 18 8c ff ff 00 00 00 00 00 00 00 00 . .............. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0 > [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0 > [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0 > [<ffffffffb6dbc00b>] vlan_vid_add+0x11b/0x290 > [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880 > [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70 > [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0 > [<ffffffffb6ac7574>] dev_open+0x94/0xa0 > [<ffffffffc177115e>] bond_enslave+0x34e/0x1840 [bonding] This shows that the issue is related to the bond driver, not team. > [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0 > [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0 > [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10 > [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70 > [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0 > [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110 > [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20 > unreferenced object 0xffff8c184c5ff2a0 (size 32): This is 'struct vlan_vid_info' > comm "ip", pid 7837, jiffies 4295135225 (age 258.160s) > hex dump (first 32 bytes): > a0 68 e1 46 18 8c ff ff a0 68 e1 46 18 8c ff ff .h.F.....h.F.... > 81 00 00 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ ^ VLAN ID 0 > backtrace: > [<ffffffffb60fb25c>] slab_post_alloc_hook+0x8c/0x3e0 > [<ffffffffb6102b39>] __kmem_cache_alloc_node+0x1d9/0x2a0 > [<ffffffffb607684e>] kmalloc_trace+0x2e/0xc0 > [<ffffffffb6dbc064>] vlan_vid_add+0x174/0x290 > [<ffffffffb6dbcffc>] vlan_device_event+0x19c/0x880 > [<ffffffffb5dde4d7>] raw_notifier_call_chain+0x47/0x70 > [<ffffffffb6ab6940>] call_netdevice_notifiers_info+0x50/0xa0 > [<ffffffffb6ac7574>] dev_open+0x94/0xa0 > [<ffffffffc177115e>] bond_enslave+0x34e/0x1840 [bonding] > [<ffffffffb6ada6b0>] do_set_master+0x90/0xb0 > [<ffffffffb6adc5f4>] do_setlink+0x514/0x11f0 > [<ffffffffb6ae4507>] __rtnl_newlink+0x4e7/0xa10 > [<ffffffffb6ae4a8c>] rtnl_newlink+0x4c/0x70 > [<ffffffffb6adf334>] rtnetlink_rcv_msg+0x184/0x5d0 > [<ffffffffb6b6ad1e>] netlink_rcv_skb+0x5e/0x110 > [<ffffffffb6ada0e9>] rtnetlink_rcv+0x19/0x20 VLAN ID 0 is automatically added by the 8021q driver when a net device is opened. In this case it's a device being enslaved to a bond. I believe the issue was exposed by the new bond test that was added in commit 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") as part of v6.3-rc3. The VLAN is supposed to be removed by the 8021q driver when a net device is closed and the bond driver indeed calls dev_close() when a slave is removed. However, this function is a NOP when 'IFF_UP' is not set. Unfortunately, when a bond changes its type to Ethernet this flag is incorrectly cleared in bond_ether_setup(), causing this VLAN to linger. As far as I can tell, it's not a new issue. Temporary fix is [1]. Please test it although we might end up with a different fix (needs more thinking and it's already late here). Reproduced using [2]. You can see in the before/after output how the flag is cleared/retained [3]. [1] diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 236e5219c811..50dc068dc259 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1777,14 +1777,15 @@ void bond_lower_state_changed(struct slave *slave) /* The bonding driver uses ether_setup() to convert a master bond device * to ARPHRD_ETHER, that resets the target netdevice's flags so we always - * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE if it was set + * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE and IFF_UP + * if they were set */ static void bond_ether_setup(struct net_device *bond_dev) { - unsigned int slave_flag = bond_dev->flags & IFF_SLAVE; + unsigned int flags = bond_dev->flags & (IFF_SLAVE | IFF_UP); ether_setup(bond_dev); - bond_dev->flags |= IFF_MASTER | slave_flag; + bond_dev->flags |= IFF_MASTER | flags; bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; } [2] #!/bin/bash ip link add name t-nlmon type nlmon ip link add name t-dummy type dummy ip link add name t-bond type bond mode active-backup ip link set dev t-bond up ip link set dev t-nlmon master t-bond ip link set dev t-nlmon nomaster ip link show dev t-bond ip link set dev t-dummy master t-bond ip link show dev t-bond ip link del dev t-bond ip link del dev t-dummy ip link del dev t-nlmon [3] Before: 12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: <BROADCAST,MULTICAST,MASTER,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ce:b2:31:0a:53:83 brd ff:ff:ff:ff:ff:ff After: 12: t-bond: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/netlink 12: t-bond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 5a:18:e7:85:11:73 brd ff:ff:ff:ff:ff:ff