Hi, > Basically, my wlan driver uses the wiphy_vendor_command ops to handle > a number of vendor specific operations. > I guess it's an out-of-tree driver, since I (hope I) fixed all the issues in the code here ... :) > One of them in particular deletes > a cfg80211 interface. There's quite normal API for that, why would you do that?! > The deadlock happens when thread 1 tries to take the > RTNL lock before calling cfg80211_unregister_device() while thread 2 is > inside nl80211_pre_doit(), holding the RTNL lock, and waiting on > wiphy_lock(). > > Here is the call flow: > > Thread 1: Thread 2: > > nl80211_pre_doit(): > -> rtnl_lock() > nl80211_pre_doit(): > -> rtnl_lock() > -> <blocked by Thread 1> > -> wiphy_lock() > -> rtnl_unlock() > -> <unblock Thread 1> > exit nl80211_pre_doit() > <Thread 2 got the RTNL lock> > -> wiphy_lock() > -> <blocked by Thread 1> > nl80211_doit() > -> nl80211_vendor_cmd() > -> rtnl_lock() <DEADLOCK> Yeah, I guess the way we invoke vendor commands now w/o RTNL held means you cannot safely acquire RTNL in them. I mean, the whole above thing basically collapses down to Thread 1 Thread 2 wiphy_lock(); // nl80211 rtnl_lock(); wiphy_lock(); rtnl_lock(); // your driver The correct order to _acquire_ these is rtnl -> wiphy, and we do it that way around everywhere (else). > I'm not an networking expert. So my main question is if I'm allowed to take > the RTNL lock inside the nl80211_vendor_cmd callbacks? Evidently, you're not. It's interesting though, it used to be that we called these with the RTNL held, now we don't, and the driver you're using somehow "got fixed" to take it, but whoever fixed it didn't take into account that this is not possible? > I hope that helps explain the issue. Let me know if you need any more > details. It does, but I don't think there's any way to fix it. You just fundamentally cannot acquire the RTNL in a vendor command operation since that introduced the ABBA deadlock you observed. Since it's an out-of-tree driver that's about as much as I can help. johannes