Wen Gong <wgong@xxxxxxxxxxxxxx> writes: > When it has more than one restart_work queued meanwhile, the 2nd > restart_work is very esay to break the 1st restart work and lead > recovery fail. > > Add a ref count to allow only one restart work running untill > device successfully recovered. > > Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. > > Signed-off-by: Wen Gong <wgong@xxxxxxxxxxxxxx> > --- > drivers/net/wireless/ath/ath10k/core.c | 13 +++++++++++++ > drivers/net/wireless/ath/ath10k/core.h | 2 ++ > drivers/net/wireless/ath/ath10k/mac.c | 1 + > 3 files changed, 16 insertions(+) > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c > index 91f131b87efc..0e31846e6c89 100644 > --- a/drivers/net/wireless/ath/ath10k/core.c > +++ b/drivers/net/wireless/ath/ath10k/core.c > @@ -2199,6 +2199,14 @@ static void ath10k_core_restart(struct work_struct *work) > { > struct ath10k *ar = container_of(work, struct ath10k, restart_work); > int ret; > + int restart_count; > + > + restart_count = atomic_add_return(1, &ar->restart_count); > + if (restart_count > 1) { > + ath10k_warn(ar, "can not restart, count: %d\n", restart_count); > + atomic_dec(&ar->restart_count); > + return; > + } I have been thinking a different approach for this. I think another option is to have a function like this: ath10k_core_firmware_crashed() { queue_work(ar->workqueue, &ar->restart_work); } In patch 1 we would convert all existing callers to call that function instead of queue_work() directly. In patch 2 we would add a new flag to enum ath10k_dev_flags, or maybe should actually use existing ATH10K_FLAG_CRASH_FLUSH? Don't know yet which one is better. Now the function would do: ath10k_core_firmware_crashed() { if (test_bit(flag)) return set_bit(flag) queue_work(ar->workqueue, &ar->restart_work); } That way restart_work queue would be called only one time. Though I'm not sure how ATH10K_STATE_WEDGED would behave after this change, it might get broken. Ah, actually I think even this patch breaks the WEDGED state. This firmware restart is tricky, difficult to say what is the best approach. Michal, are you reading? :) Any ideas? And after looking more about this patch I don't see the need for the new ar->restart_count atomic variable. Checking for ATH10K_FLAG_CRASH_FLUSH would do the same thing AFAICS. And related to this, (in a separate patch) I think we should utilise ATH10K_FLAG_CRASH_FLUSH more. For example in ath10k_wmi_cmd_send() to not even try to send a WMI command if the flag is set. Basically all hardware access should be disabled except what is needed to restart the firmware. -- https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches