Search Linux Wireless

Re: [BUG] deadlock in nl80211_vendor_cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


> Basically, my wlan driver uses the wiphy_vendor_command ops to handle
> a number of vendor specific operations.
> 

I guess it's an out-of-tree driver, since I (hope I) fixed all the
issues in the code here ... :)

> One of them in particular deletes
> a cfg80211 interface.

There's quite normal API for that, why would you do that?!

> The deadlock happens when thread 1 tries to take the
> RTNL lock before calling cfg80211_unregister_device() while thread 2 is
> inside nl80211_pre_doit(), holding the RTNL lock, and waiting on
> wiphy_lock().
> 
> Here is the call flow:
> 
> Thread 1:                         Thread 2:
> 
> nl80211_pre_doit():
>   -> rtnl_lock()
>                                       nl80211_pre_doit():
>                                        -> rtnl_lock()
>                                        -> <blocked by Thread 1>
>   -> wiphy_lock()
>   -> rtnl_unlock()
>   -> <unblock Thread 1>
> exit nl80211_pre_doit()
>                                        <Thread 2 got the RTNL lock>
>                                        -> wiphy_lock()
>                                        -> <blocked by Thread 1>
> nl80211_doit()
>   -> nl80211_vendor_cmd()
>       -> rtnl_lock() <DEADLOCK>

Yeah, I guess the way we invoke vendor commands now w/o RTNL held means
you cannot safely acquire RTNL in them.

I mean, the whole above thing basically collapses down to

 Thread 1                           Thread 2
  wiphy_lock(); // nl80211
                                     rtnl_lock();
                                     wiphy_lock();
  rtnl_lock();  // your driver

The correct order to _acquire_ these is rtnl -> wiphy, and we do it that
way around everywhere (else).

> I'm not an networking expert. So my main question is if I'm allowed to take
> the RTNL lock inside the nl80211_vendor_cmd callbacks?

Evidently, you're not. It's interesting though, it used to be that we
called these with the RTNL held, now we don't, and the driver you're
using somehow "got fixed" to take it, but whoever fixed it didn't take
into account that this is not possible?

> I hope that helps explain the issue. Let me know if you need any more
> details.

It does, but I don't think there's any way to fix it. You just
fundamentally cannot acquire the RTNL in a vendor command operation
since that introduced the ABBA deadlock you observed.

Since it's an out-of-tree driver that's about as much as I can help.

johannes



[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux