Search Linux Wireless

Re: [PATCH] wifi: wilc1000: validate chip id during bus probe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/25/24 04:04, Alexis Lothoré wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 1/25/24 07:23, Ajay.Kathat@xxxxxxxxxxxxx wrote:
>> On 1/24/24 11:45, David Mosberger-Tang wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> On Wed, 2024-01-24 at 10:31 -0700, David Mosberger-Tang wrote:
>>>> Alexis,
>>>>
>>>> On Wed, 2024-01-24 at 10:01 +0100, Alexis Lothoré wrote:
>>>>> ==================================================================
>>>>> BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0x294/0x2c0
>>>>> Read of size 4 at addr c3c91ce8 by task swapper/1
> 
> Replying a bit late to your initial questions:
> - I am running an ARM32 platform (SAMA5D27)
> - for the wilc_netdev_cleanup line, the 0x294 offset indeed points to
> list_for_each_entry_rcu(vif, &wilc->vif_list, list) in my case, but you seemed
> to have identified this already.
> 
>>>>
>>>> OK, I think I see what's going on: it's the list traversal.  Here is what
>>>> wilc_netdev_cleanup() does:
>>>>
>>>>       list_for_each_entry_rcu(vif, &wilc->vif_list, list) {
>>>>               if (vif->ndev)
>>>>                       unregister_netdev(vif->ndev);
>>>>       }
>>>>
>>>> The problem is that "vif" is the private part of the netdev, so when the netdev
>>>> is freed, the vif structure is no longer valid and list_for_each_entry_rcu()
>>>> ends up dereferencing a dangling pointer.
> 
> Your diagnostic sounds relevant :)
> 
>>>> Ajay or Alexis, could you propose a fix for this - this is not something I'm
>>>> familiar with.
>>>
>>> Actually, after staring at the code a little longer, is there anything wrong
>>> with delaying the unregister_netdev() call until after the vif has been removed
>>> from the vif-list?  Something along the lines of the below.
> 
> I guess you _could_ do something like this, but I think you have to be very
> careful about potential races. If I'm not wrong the following could happen:
> - you enter the wilc_netdev_cleanup and start removing vifs from list
> - meanwhile, because your net device is still registered, userspace can start
> calling concurrently some cgf80211_ops

IFAIK cfg80211_ops shouldn't get called until the netdev interface is
up(ifconfig wlan0 up). In the probe, only the netdev interface is
allocated and cfg80211_ops is registered and the cfg80211_ops callback
should be called when the interface is up.

> - some of those ops may try to get the vif matching your netdevice from the
> list, but it is not there anymore
> 
I have not tested this by enabling Kasan configuration so I haven't
observe this issue yet. As I understand, the warning(use-after-free) is
reported for the below line in wilc_netdev_cleanup() so that must be the
code where used-after-free warning is reported. Isn't it.

>>>>       list_for_each_entry_rcu(vif, &wilc->vif_list, list) {


> Maybe some rtnl or wiphy lock (used from cfg80211 core or net core) will save
> you from this for some of wilc_netdev_cleanup calls, but I think that won't be
> true for the one in the error path of the probe chain.
> 
>> I think we need to investigate the actual reason for Kasan warning.
>> First, we need to confirm if this warning exists without the patch(test
>> by simulating a force error in wilc_bus_probe()). When
>> wilc_netdev_cleanup() is also called from wilc_bus_remove(), I believe
>> this warning was not observed. Once it is confirmed, the fix can be done
>> accordingly.
> 
> It happens too in wilc_bus_remove:
> echo "spi0.1" > /sys/bus/spi/drivers/wilc1000_spi/unbind

Okay, so it confirms that this warning is not related to the probe
function.

> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in wilc_netdev_cleanup+0xf0/0x244
> Read of size 4 at addr c3c91ce8 by task sh/91
> 
> CPU: 0 PID: 91 Comm: sh Not tainted 6.7.0-wt+ #845
> Hardware name: Atmel SAMA5
>  unwind_backtrace from show_stack+0x18/0x1c
>  show_stack from dump_stack_lvl+0x34/0x48
>  dump_stack_lvl from print_report+0x154/0x500
>  print_report from kasan_report+0xd8/0x100
>  kasan_report from wilc_netdev_cleanup+0xf0/0x244
>  wilc_netdev_cleanup from wilc_bus_remove+0x50/0x5c
>  wilc_bus_remove from spi_remove+0x40/0x50
>  spi_remove from device_release_driver_internal+0x21c/0x2ac
>  device_release_driver_internal from unbind_store+0x70/0xac
>  unbind_store from kernfs_fop_write_iter+0x1a0/0x284
>  kernfs_fop_write_iter from vfs_write+0x38c/0x6f4
>  vfs_write from ksys_write+0xd8/0x178
>  ksys_write from ret_fast_syscall+0x0/0x1c
> Exception stack(0xc588ffa8 to 0xc588fff0)
> ffa0:                   00000007 004eff68 00000001 004eff68 00000007 00000001
> ffc0: 00000007 004eff68 00000001 00000004 00000007 b69546c0 00000020 004ed190
> ffe0: 00000004 b6954678 aec3a041 aebd1a26
> 
>> Avoiding netdev initialization in wilc_cfg80211_init() would require lot
>> of modification and changes in API call order. IMO the Kasan warning fix
>> should be independent of netdev initialization order and it should
>> free-up the resources for all cases. Suppose if the card is not present,
>> the probe API should clean-up and exit gracefully. For detecting the
>> chip_id, I have created the below patch considering the above scenarios,
>> please check if it makes sense.
> 
> Agree about the wilc_netdev_cleanup fix being a separated topic, to be handled
> accordingly.
> Still, the scenario I mention above, if true, makes me more confident that we
> should not register at all the netdevice before being able to manage it. Maybe
> those cases are already handled correctly with some checks to make sure no real
> crash occurs, but all those checks could be avoided if we did not register the
> netdevice so early
> neering

I think this issue is not related to early registration of netdevice
since same behavior was observed with "echo "spi0.1" >
/sys/bus/spi/drivers/wilc1000_spi/unbind" command.

If needed, the netdevice allocation can be delayed until other
conditions(resources) are allocated/successful. But it is also possible
that the other conditions(resource) are successful but netdevice
allocation gets failed, in that case allocating other resources before
may not be correct. For the success case, all the condition should succeed.
Currently the driver initialization has a order that already invokes
"netdev_clean:" in wilc_bus_probe() for failure cases, so it should free
up the resources completely. Additionally, this warning was reported at
run time(not only in probe). I believe if it is fix in
wilc_netdev_cleanup() then it will cover for all the scenarios.

Regards,
Ajay




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux