Search Linux Wireless

Re: [BUG] deadlock in nl80211_vendor_cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-03-25 at 20:36 +0000, William McVicker wrote:
> 
> I found that my wlan driver is using the vendor commands to create/delete NAN
> interfaces for this Android feature called Wi-Fi aware [1]. Basically, this
> features allows users to discover other nearby devices and allows them to
> connect directly with one another over a local network. 
> 

Wait, why is it doing that? We actually support a NAN interface type
upstream :) It's not really quite fully fleshed out, but it could be?
Probably should be?


> Thread 1                         Thread 2
>  nl80211_pre_doit():
>    rtnl_lock()
>    wiphy_lock()                   nl80211_pre_doit():
>                                     rtnl_lock() // blocked by Thread 1
>  nl80211_vendor_cmd():
>    doit()
>      cfg80211_unregister_netdevice()
>    rtnl_unlock():
>      netdev_run_todo():
>        __rtnl_unlock()
>                                     <got RTNL lock>
>                                     wiphy_lock() // blocked by Thread 1
>        rtnl_lock(); // DEADLOCK
>  nl80211_post_doit():
>    wiphy_unlock();


Right, this is what I had discussed in my other mails.

Basically, you're actually doing (some form of) unregister_netdevice()
before rtnl_unlock().

Clearly this isn't possible in cfg80211 itself.

However, I couldn't entirely discount the possibility that this is
possible:

Thread 1                   Thread 2
                            rtnl_lock()
                            unregister_netdevice()
                            __rtnl_unlock()
rtnl_lock()
wiphy_lock()
netdev_run_todo()
 __rtnl_unlock()
 // list not empty now    
 // because of thread 2     rtnl_lock()
 rtnl_lock()
                            wiphy_lock()

** DEADLOCK **


Given my other discussion with Jakub though, it seems that we can indeed
make sure that this cannot happen, and then this scenario is impossible
without the unregistration you're doing.

> Since I'm unlocking the RTNL inside nl80211_vendor_cmd() after calling doit()
> instead of waiting till post_doit(), I get into the situation you mentioned
> where the net_todo_list is not empty when calling rtnl_unlock. So I decided to
> drop the rtnl_unlock() in nl80211_vendor_cmd() and defer that until
> nl80211_post_doit() after calling wiphy_unlock(). With this change, I haven't
> been able to reproduce the deadlock. So it's possible that we aren't actually
> able to hit this deadlock in nl80211_pre_doit() with the existing code since,
> as you mentioned, one wouldn't be able to call unregister_netdevice() without
> having the RTNL lock.
> 

Right, this is why I said earlier that actually adding a flag for vendor
commands to get the RTNL would be more complex - you'd have to basically
open-code pre_doit() and post_doit() in there and check the sub-command
flag at the very beginning and very end.

johannes



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux