On 1/25/24 07:23, Ajay.Kathat@xxxxxxxxxxxxx wrote: > On 1/24/24 11:45, David Mosberger-Tang wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> On Wed, 2024-01-24 at 10:31 -0700, David Mosberger-Tang wrote: >>> Alexis, >>> >>> On Wed, 2024-01-24 at 10:01 +0100, Alexis Lothoré wrote: >>>> ================================================================== >>>> BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0x294/0x2c0 >>>> Read of size 4 at addr c3c91ce8 by task swapper/1 Replying a bit late to your initial questions: - I am running an ARM32 platform (SAMA5D27) - for the wilc_netdev_cleanup line, the 0x294 offset indeed points to list_for_each_entry_rcu(vif, &wilc->vif_list, list) in my case, but you seemed to have identified this already. >>> >>> OK, I think I see what's going on: it's the list traversal. Here is what >>> wilc_netdev_cleanup() does: >>> >>> list_for_each_entry_rcu(vif, &wilc->vif_list, list) { >>> if (vif->ndev) >>> unregister_netdev(vif->ndev); >>> } >>> >>> The problem is that "vif" is the private part of the netdev, so when the netdev >>> is freed, the vif structure is no longer valid and list_for_each_entry_rcu() >>> ends up dereferencing a dangling pointer. Your diagnostic sounds relevant :) >>> Ajay or Alexis, could you propose a fix for this - this is not something I'm >>> familiar with. >> >> Actually, after staring at the code a little longer, is there anything wrong >> with delaying the unregister_netdev() call until after the vif has been removed >> from the vif-list? Something along the lines of the below. I guess you _could_ do something like this, but I think you have to be very careful about potential races. If I'm not wrong the following could happen: - you enter the wilc_netdev_cleanup and start removing vifs from list - meanwhile, because your net device is still registered, userspace can start calling concurrently some cgf80211_ops - some of those ops may try to get the vif matching your netdevice from the list, but it is not there anymore Maybe some rtnl or wiphy lock (used from cfg80211 core or net core) will save you from this for some of wilc_netdev_cleanup calls, but I think that won't be true for the one in the error path of the probe chain. > I think we need to investigate the actual reason for Kasan warning. > First, we need to confirm if this warning exists without the patch(test > by simulating a force error in wilc_bus_probe()). When > wilc_netdev_cleanup() is also called from wilc_bus_remove(), I believe > this warning was not observed. Once it is confirmed, the fix can be done > accordingly. It happens too in wilc_bus_remove: echo "spi0.1" > /sys/bus/spi/drivers/wilc1000_spi/unbind ================================================================== BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0xf0/0x244 Read of size 4 at addr c3c91ce8 by task sh/91 CPU: 0 PID: 91 Comm: sh Not tainted 6.7.0-wt+ #845 Hardware name: Atmel SAMA5 unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x34/0x48 dump_stack_lvl from print_report+0x154/0x500 print_report from kasan_report+0xd8/0x100 kasan_report from wilc_netdev_cleanup+0xf0/0x244 wilc_netdev_cleanup from wilc_bus_remove+0x50/0x5c wilc_bus_remove from spi_remove+0x40/0x50 spi_remove from device_release_driver_internal+0x21c/0x2ac device_release_driver_internal from unbind_store+0x70/0xac unbind_store from kernfs_fop_write_iter+0x1a0/0x284 kernfs_fop_write_iter from vfs_write+0x38c/0x6f4 vfs_write from ksys_write+0xd8/0x178 ksys_write from ret_fast_syscall+0x0/0x1c Exception stack(0xc588ffa8 to 0xc588fff0) ffa0: 00000007 004eff68 00000001 004eff68 00000007 00000001 ffc0: 00000007 004eff68 00000001 00000004 00000007 b69546c0 00000020 004ed190 ffe0: 00000004 b6954678 aec3a041 aebd1a26 > Avoiding netdev initialization in wilc_cfg80211_init() would require lot > of modification and changes in API call order. IMO the Kasan warning fix > should be independent of netdev initialization order and it should > free-up the resources for all cases. Suppose if the card is not present, > the probe API should clean-up and exit gracefully. For detecting the > chip_id, I have created the below patch considering the above scenarios, > please check if it makes sense. Agree about the wilc_netdev_cleanup fix being a separated topic, to be handled accordingly. Still, the scenario I mention above, if true, makes me more confident that we should not register at all the netdevice before being able to manage it. Maybe those cases are already handled correctly with some checks to make sure no real crash occurs, but all those checks could be avoided if we did not register the netdevice so early Alexis -- Alexis Lothoré, Bootlin Embedded Linux and Kernel engineering https://bootlin.com