On 8/29/2024 7:39 PM, Baochen Qiang wrote: > From: Wen Gong <quic_wgong@xxxxxxxxxxx> > > Running this test in a loop it is easy to reproduce an rtnl deadlock: > > iw reg set FI > ifconfig wlan0 down > > What happens is that thread A (workqueue) tries to update the regulatory: > > try to acquire the rtnl_lock of ar->regd_update_work > > rtnl_lock > ath12k_regd_update [ath12k] > ath12k_regd_update_work [ath12k] > process_one_work > worker_thread > kthread > ret_from_fork > > And thread B (ifconfig) tries to stop the interface: > > try to cancel_work_sync(&ar->regd_update_work) in ath12k_mac_op_stop(). > ifconfig 3109 [003] 2414.232506: probe: > > ath12k_mac_op_stop [ath12k] > drv_stop [mac80211] > ieee80211_do_stop [mac80211] > ieee80211_stop [mac80211] > > The sequence of deadlock is: > > 1. Thread B calls rtnl_lock(). > > 2. Thread A starts to run and calls rtnl_lock() from within > ath12k_regd_update_work(), then enters wait state because the lock is owned by checkpatch complains that the commit description exceeds 75 columns at a minimum you should avoid exceeding 80 columns Kalle, do you want to reformat when you pull into pending? Or are you ok with the current formatting? > thread B. > > 3. Thread B tries to call cancel_work_sync(&ar->regd_update_work), but thread A is in > ath12k_regd_update_work() waiting for rtnl_lock(). So cancel_work_sync() > forever waits for ath12k_regd_update_work() to finish and we have a deadlock. > > Change to use regulatory_set_wiphy_regd(), which is the asynchronous version of > regulatory_set_wiphy_regd_sync(). This way rtnl & wiphy locks are not required so can > be removed, and in the end the deadlock issue can be avoided. > > But a side effect introduced by the asynchronous regd update is that, some essential > information used in ath12k_reg_update_chan_list(), which would be called later in > ath12k_regd_update(), might has not been updated by cfg80211, as a result wrong > channel parameters sent to firmware. > > To handle this side effect, move ath12k_reg_update_chan_list() to ath12k_reg_notifier(), > and advertise WIPHY_FLAG_NOTIFY_REGDOM_BY_DRIVER to cfg80211. This works because, > in the process of the asynchronous regd update, after the new regd is processed, > cfg80211 will notify ath12k by calling ath12k_reg_notifier(). Since all essential > information is updated at that time, we are good to do channel list update. > > Please note ath12k_reg_notifier() could also be called due to other reasons, like > core/beacon/user hints etc. For them we are not allowed to call > ath12k_reg_update_chan_list() because regd has not been updated. This is done by > verifying the initiator. > > Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 > > Signed-off-by: Wen Gong <quic_wgong@xxxxxxxxxxx> > Co-developed-by: Baochen Qiang <quic_bqiang@xxxxxxxxxxx> > Signed-off-by: Baochen Qiang <quic_bqiang@xxxxxxxxxxx> code change itself LGTM, so... Acked-by: Jeff Johnson <quic_jjohnson@xxxxxxxxxxx>