在 2023/7/19 13:04, Eric Dumazet 写道: > On Wed, Jul 19, 2023 at 6:41 AM Ziyang Xuan (William) > <william.xuanziyang@xxxxxxxxxx> wrote: >> >>> On Mon, Jul 17, 2023 at 9:27 AM Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> wrote: >>>> >>>> On 11.07.2023 09:17:37, Ziyang Xuan wrote: >>>>> Got kmemleak errors with the following ltp can_filter testcase: >>>>> >>>>> for ((i=1; i<=100; i++)) >>>>> do >>>>> ./can_filter & >>>>> sleep 0.1 >>>>> done >>>>> >>>>> ============================================================== >>>>> [<00000000db4a4943>] can_rx_register+0x147/0x360 [can] >>>>> [<00000000a289549d>] raw_setsockopt+0x5ef/0x853 [can_raw] >>>>> [<000000006d3d9ebd>] __sys_setsockopt+0x173/0x2c0 >>>>> [<00000000407dbfec>] __x64_sys_setsockopt+0x61/0x70 >>>>> [<00000000fd468496>] do_syscall_64+0x33/0x40 >>>>> [<00000000b7e47d51>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 >>>>> >>>>> It's a bug in the concurrent scenario of unregister_netdevice_many() >>>>> and raw_release() as following: >>>>> >>>>> cpu0 cpu1 >>>>> unregister_netdevice_many(can_dev) >>>>> unlist_netdevice(can_dev) // dev_get_by_index() return NULL after this >>>>> net_set_todo(can_dev) >>>>> raw_release(can_socket) >>>>> dev = dev_get_by_index(, ro->ifindex); // dev == NULL >>>>> if (dev) { // receivers in dev_rcv_lists not free because dev is NULL >>>>> raw_disable_allfilters(, dev, ); >>>>> dev_put(dev); >>>>> } >>>>> ... >>>>> ro->bound = 0; >>>>> ... >>>>> >>>>> call_netdevice_notifiers(NETDEV_UNREGISTER, ) >>>>> raw_notify(, NETDEV_UNREGISTER, ) >>>>> if (ro->bound) // invalid because ro->bound has been set 0 >>>>> raw_disable_allfilters(, dev, ); // receivers in dev_rcv_lists will never be freed >>>>> >>>>> Add a net_device pointer member in struct raw_sock to record bound can_dev, >>>>> and use rtnl_lock to serialize raw_socket members between raw_bind(), raw_release(), >>>>> raw_setsockopt() and raw_notify(). Use ro->dev to decide whether to free receivers in >>>>> dev_rcv_lists. >>>>> >>>>> Fixes: 8d0caedb7596 ("can: bcm/raw/isotp: use per module netdevice notifier") >>>>> Signed-off-by: Ziyang Xuan <william.xuanziyang@xxxxxxxxxx> >>>>> Reviewed-by: Oliver Hartkopp <socketcan@xxxxxxxxxxxx> >>>>> Acked-by: Oliver Hartkopp <socketcan@xxxxxxxxxxxx> >>>> >>>> Added to linux-can/testing. >>>> >>> >>> This patch causes three syzbot LOCKDEP reports so far. >> >> Hello Eric, >> >> Is there reproducer? I want to understand the specific root cause. >> > > No repro yet, but simply look at other functions in net/can/raw.c > > You must always take locks in the same order. > > raw_bind(), raw_setsockopt() use: > > rtnl_lock(); > lock_sock(sk); > > Therefore, raw_release() must _also_ use the same order, or risk deadlock. > > Please build a LOCKDEP enabled kernel, and run your tests ? I know now. This needs raw_bind() and raw_setsockopt() concurrent with raw_release(). And there is not the scenario in my current testcase. I did not get it. I will try to reproduce it and add the testcase. Thank you for your patient explanation. William Xuan. > . >