On 5/28/2024 8:44 PM, Aaradhana Sahu wrote: > Whenever firmware is crashed in split-phy below WARN_ON triggered: > > ? __warn+0x7b/0x1a0 > ? drv_stop+0x1eb/0x210 [mac80211] > ? report_bug+0x10b/0x200 > ? handle_bug+0x3f/0x70 > ? exc_invalid_op+0x13/0x60 > ? asm_exc_invalid_op+0x16/0x20 > ? drv_stop+0x1eb/0x210 [mac80211] > ieee80211_do_stop+0x5ba/0x850 [mac80211] > ieee80211_stop+0x51/0x180 [mac80211] > __dev_close_many+0xb3/0x130 > dev_close_many+0xa3/0x180 > ? lock_release+0xde/0x420 > dev_close.part.147+0x5f/0xa0 > cfg80211_shutdown_all_interfaces+0x44/0xe0 [cfg80211] > ieee80211_restart_work+0xf9/0x130 [mac80211] > process_scheduled_works+0x377/0x6f0 > > The sequence of WARN_ON is: > Thread 1: > -Firmware crash calls ath12k_core_reset(). > -Call ieee80211_restart_hw() inside > ath12k_core_post_reconfigure_recovery() which schedules worker > for both hardware. > -Wait for completion of ab->recovery_start. > > Thread 2 (worker thread): > -One hardware acquires rtnl_lock() inside ieee80211_restart_hw() and > calls ath12k_mac_wait_reconfigure() into ath12k_mac_op_start(). > -Hardware is waiting for ab->reconfigure_complete but at this time > recovery_start_count value is 1 because another worker thread > (local->restart_work) is still waiting for rtnl_lock(). > recovery_start_count is not equal to number of radios > (2 in split-phy). So ab->recovery_start complete does not set > due to this, thread 1 is still waiting and not able to perform > hif power down up and firmware reload. > -Wait timeout happens for ab->reconfigure_complete and comeback > to caller (ath12k_mac_op_start()) and sends WMI command to > crashed firmware and gets error. > -This returns error to drv_start() and local->started is set to false. > -Hardware calls cfg80211_shutdown_all_interfaces() after receiving error > inside ieee80211_restart_work() and goes to drv_stop(), here we trigger > WARN_ON as local->started is false. > > To fix this issue call ieee80211_restart_hw() after firmware has been > reloaded. Now, each hardware can send WMI command to firmware > successfully. With this fix we don't need to wait for > ab->recovery_start completion so remove > ath12k_mac_wait_reconfigure(). > > Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.0.1-00029-QCAHKSWPL_SILICONZ-1 > Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.1.1-00209-QCAHKSWPL_SILICONZ-1 > Tested-on: WCN7850 HW2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 > > Signed-off-by: Aaradhana Sahu <quic_aarasahu@xxxxxxxxxxx> Acked-by: Jeff Johnson <quic_jjohnson@xxxxxxxxxxx>