I can confirm that the machine reboots with this patch applied. Harald Arnesen Johannes Berg [27.04.2021 11:49]: > From: Johannes Berg <johannes.berg@xxxxxxxxx> > > Harald Arnesen reported [1] a deadlock at reboot time, and after > he captured a stack trace a picture developed of what's going on: > > The distribution he's using is using iwd (not wpa_supplicant) to > manage wireless. iwd will usually use the "socket owner" option > when it creates new interfaces, so that they're automatically > destroyed when it quits (unexpectedly or otherwise). This is also > done by wpa_supplicant, but it doesn't do it for the normal one, > only for additional ones, which is different with iwd. > > Anyway, during shutdown, iwd quits while the netdev is still UP, > i.e. IFF_UP is set. This causes the stack trace that Linus so > nicely transcribed from the pictures: > > cfg80211_destroy_iface_wk() takes wiphy_lock > -> cfg80211_destroy_ifaces() > ->ieee80211_del_iface > ->ieeee80211_if_remove > ->cfg80211_unregister_wdev > ->unregister_netdevice_queue > ->dev_close_many > ->__dev_close_many > ->raw_notifier_call_chain > ->cfg80211_netdev_notifier_call > and that last call tries to take wiphy_lock again. > > In commit a05829a7222e ("cfg80211: avoid holding the RTNL when > calling the driver") I had taken into account the possibility of > recursing from cfg80211 into cfg80211_netdev_notifier_call() via > the network stack, but only for NETDEV_UNREGISTER, not for what > happens here, NETDEV_GOING_DOWN and NETDEV_DOWN notifications. > > Additionally, while this worked still back in commit 78f22b6a3a92 > ("cfg80211: allow userspace to take ownership of interfaces"), it > missed another corner case: unregistering a netdev will cause > dev_close() to be called, and thus stop wireless operations (e.g. > disconnecting), but there are some types of virtual interfaces in > wifi that don't have a netdev - for that we need an additional > call to cfg80211_leave(). > > So, to fix this mess, change cfg80211_destroy_ifaces() to not > require the wiphy_lock(), but instead make it acquire it, but > only after it has actually closed all the netdevs on the list, > and then call cfg80211_leave() as well before removing them > from the driver, to fix the second issue. The locking change in > this requires modifying the nl80211 call to not get the wiphy > lock passed in, but acquire it by itself after flushing any > potentially pending destruction requests. > > [1] https://lore.kernel.org/r/09464e67-f3de-ac09-28a3-e27b7914ee7d@xxxxxxxxxxx > > Cc: stable@xxxxxxxxxxxxxxx # 5.12 > Reported-by: Harald Arnesen <harald@xxxxxxxxxxx> > Fixes: 776a39b8196d ("cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held") > Fixes: 78f22b6a3a92 ("cfg80211: allow userspace to take ownership of interfaces") > Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx> > --- > Linus, I'll send this the regular way, just CC'ing you since > you were involved in the debug. > --- > net/wireless/core.c | 21 +++++++++++++++++---- > net/wireless/nl80211.c | 24 +++++++++++++++++++----- > 2 files changed, 36 insertions(+), 9 deletions(-) > > diff --git a/net/wireless/core.c b/net/wireless/core.c > index a2785379df6e..589ee5a69a2e 100644 > --- a/net/wireless/core.c > +++ b/net/wireless/core.c > @@ -332,14 +332,29 @@ static void cfg80211_event_work(struct work_struct *work) > void cfg80211_destroy_ifaces(struct cfg80211_registered_device *rdev) > { > struct wireless_dev *wdev, *tmp; > + bool found = false; > > ASSERT_RTNL(); > - lockdep_assert_wiphy(&rdev->wiphy); > > + list_for_each_entry(wdev, &rdev->wiphy.wdev_list, list) { > + if (wdev->nl_owner_dead) { > + if (wdev->netdev) > + dev_close(wdev->netdev); > + found = true; > + } > + } > + > + if (!found) > + return; > + > + wiphy_lock(&rdev->wiphy); > list_for_each_entry_safe(wdev, tmp, &rdev->wiphy.wdev_list, list) { > - if (wdev->nl_owner_dead) > + if (wdev->nl_owner_dead) { > + cfg80211_leave(rdev, wdev); > rdev_del_virtual_intf(rdev, wdev); > + } > } > + wiphy_unlock(&rdev->wiphy); > } > > static void cfg80211_destroy_iface_wk(struct work_struct *work) > @@ -350,9 +365,7 @@ static void cfg80211_destroy_iface_wk(struct work_struct *work) > destroy_work); > > rtnl_lock(); > - wiphy_lock(&rdev->wiphy); > cfg80211_destroy_ifaces(rdev); > - wiphy_unlock(&rdev->wiphy); > rtnl_unlock(); > } > > diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c > index b1df42e4f1eb..a5224da63832 100644 > --- a/net/wireless/nl80211.c > +++ b/net/wireless/nl80211.c > @@ -3929,7 +3929,7 @@ static int nl80211_set_interface(struct sk_buff *skb, struct genl_info *info) > return err; > } > > -static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) > +static int _nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) > { > struct cfg80211_registered_device *rdev = info->user_ptr[0]; > struct vif_params params; > @@ -3938,9 +3938,6 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) > int err; > enum nl80211_iftype type = NL80211_IFTYPE_UNSPECIFIED; > > - /* to avoid failing a new interface creation due to pending removal */ > - cfg80211_destroy_ifaces(rdev); > - > memset(¶ms, 0, sizeof(params)); > > if (!info->attrs[NL80211_ATTR_IFNAME]) > @@ -4028,6 +4025,21 @@ static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) > return genlmsg_reply(msg, info); > } > > +static int nl80211_new_interface(struct sk_buff *skb, struct genl_info *info) > +{ > + struct cfg80211_registered_device *rdev = info->user_ptr[0]; > + int ret; > + > + /* to avoid failing a new interface creation due to pending removal */ > + cfg80211_destroy_ifaces(rdev); > + > + wiphy_lock(&rdev->wiphy); > + ret = _nl80211_new_interface(skb, info); > + wiphy_unlock(&rdev->wiphy); > + > + return ret; > +} > + > static int nl80211_del_interface(struct sk_buff *skb, struct genl_info *info) > { > struct cfg80211_registered_device *rdev = info->user_ptr[0]; > @@ -15040,7 +15052,9 @@ static const struct genl_small_ops nl80211_small_ops[] = { > .doit = nl80211_new_interface, > .flags = GENL_UNS_ADMIN_PERM, > .internal_flags = NL80211_FLAG_NEED_WIPHY | > - NL80211_FLAG_NEED_RTNL, > + NL80211_FLAG_NEED_RTNL | > + /* we take the wiphy mutex later ourselves */ > + NL80211_FLAG_NO_WIPHY_MTX, > }, > { > .cmd = NL80211_CMD_DEL_INTERFACE, > -- Hilsen Harald