On 24/04/17 17:41, Xin Long wrote: > On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov > <nikolay@xxxxxxxxxxxxxxxxxxx> wrote: >> On 24/04/17 14:01, Nikolay Aleksandrov wrote: >>> On 24/04/17 10:25, Xin Long wrote: >>>> During removing a bridge device, if the bridge is still up, a new mdb entry >>>> still can be added in br_multicast_add_group() after all mdb entries are >>>> removed in br_multicast_dev_del(). Like the path: >>>> >>>> mld_ifc_timer_expire -> >>>> mld_sendpack -> ... >>>> br_multicast_rcv -> >>>> br_multicast_add_group >>>> >>>> The new mp's timer will be set up. If the timer expires after the bridge >>>> is freed, it may cause use-after-free panic in br_multicast_group_expired. >>>> This can happen when ip link remove a bridge or destroy a netns with a >>>> bridge device inside. >>>> >>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge >>>> device after it's shutdown. >>>> >>>> This patch is to call dev_close at the beginning of br_dev_delete so that >>>> netif_running check in br_multicast_add_group can avoid this issue. But >>>> to keep consistent with before, it will not remove the IFF_UP check in >>>> br_del_bridge for brctl. >>>> >>>> Reported-by: Jianwen Ji <jiji@xxxxxxxxxx> >>>> Signed-off-by: Xin Long <lucien.xin@xxxxxxxxx> >>>> --- >>>> net/bridge/br_if.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>> >>> +CC bridge maintainers >>> >>> I can see how this could happen, could you also provide the traceback ? >>> >>> The patch looks good to me, actually I think it fixes another issue with >>> mcast stats where the percpu pointer can be accessed after it's freed if >>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call. >>> This is definitely stable material, if I'm not mistaken the issue is there since >>> the introduction of br_dev_delete: >>> commit e10177abf842 >>> Author: Satish Ashok <sashok@xxxxxxxxxxxxxxxxxxx> >>> Date: Wed Jul 15 07:16:51 2015 -0700 >>> >>> bridge: multicast: fix handling of temp and perm entries >>> >>> >>> >>> Acked-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxxxxxxxxxxx> >>> >> >> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy. >> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ? >> That should have the same effect and be much faster. > Yes. But it seems that all cleanups for bridge should be done after > it's shutdown since beginning according to brctl. I'm not sure if there > are still other problems caused by this. maybe safer to use dev_close. > I need to check more to confirm this. > ndo_uninit() is after the device has been stopped, so it is the same as your fix as I said. > I also have another question about mp->timer removing. > As we can see, now it removes this timer with del_timer, instead of > del_timer_sync. What if the timer is running when del_timer ? > How can we be sure that br_multicast_group_expired will be done > before the bridge dev is freed. synchronize_net ? > Yeah, I've been thinking about that and the only race is that the timer might have fired and waiting for the lock while the mdb is being flushed thus the cancel_timer() won't affect it and then it will enter and see that !netif_running(br->dev), but unfortunately there's a bug because we cannot guarantee that br->dev still exists at that point. This is a different bug though. >> >> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated >> and not freed due to the mdb rehash, here's also kmemleak's object: >> > yeps, ;-) > >> unreferenced object 0xffff8800540ba800 (size 2048): >> comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s) >> hex dump (first 32 bytes): >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >> backtrace: >> [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0 >> [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0 >> [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge] >> [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge] >> [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge] >> [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge] >> [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge] >> [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0 >> [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10 >> [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20 >> [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6] >> [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6] >> [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6] >> [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6] >> [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6] >> [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6] >> >> >>