Johannes Berg <johannes@xxxxxxxxxxxxxxxx> writes: > On Wed, 2019-09-11 at 21:19 +0300, Kalle Valo wrote: >> > Looks like indeed the driver gives the device at least *3 seconds* for >> > every command, see ath10k_wmi_cmd_send(), so most likely this would >> > eventually have finished, but who knows how many firmware commands it >> > would still have attempted to send... >> >> 3 seconds is a bit short but in normal cases it should be enough. Of >> course we could increase the delay but I'm skeptic it would help here. > > I was thinking 3 seconds is way too long :-) Heh :) >> > Perhaps the driver should mark the device as dead and fail quickly once >> > it timed out once, or so, but I'll let Kalle comment on that. >> >> Actually we do try to restart the device when a timeout happens in >> ath10k_wmi_cmd_send(): >> >> if (ret == -EAGAIN) { >> ath10k_warn(ar, "wmi command %d timeout, restarting hardware\n", >> cmd_id); >> queue_work(ar->workqueue, &ar->restart_work); >> } > > Yeah, and this is the problem, in a sense, I'd think. It seems to me > that at this point the code needs to tag the device as "dead" and > immediately return from any further commands submitted to it with an > error (e.g. -EIO). Yeah, ath10k_core_restart() is supposed change to state ATH10K_STATE_RESTARTING but very few mac80211 ops in ath10k_ops are checking for it, and to me it looks like quite late even. I think a proper fix for ops which can sleep is to check ar->state is ATH10K_STATE_ON and for ops which cannot sleep check ATH10K_FLAG_CRASH_FLUSH. But of course this just fixes the symptoms, the root cause for timeouts needs to be found as well. -- https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches