Search Linux Wireless

Re: [RFC] ath10k: Attempt to work around napi_synchronize hang.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/28/2018 09:31 AM, Michał Kazior wrote:
On 28 February 2018 at 02:22,  <greearb@xxxxxxxxxxxxxxx> wrote:
[...]
@@ -2086,8 +2087,28 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
        ath10k_pci_irq_disable(ar);
        ath10k_pci_irq_sync(ar);
        ath10k_pci_flush(ar);
-       napi_synchronize(&ar->napi);
-       napi_disable(&ar->napi);
+
+       /* Calling napi_disable twice in a row (w/out starting it and/or without
+        * having NAPI active leads to deadlock because napi_disable sets
+        * NAPI_STATE_SCHED and NAPI_STATE_NPSVC when it returns, as far as I
+        * can tell.  So, guard this call to napi_disable.  I believe the
+        * failure case is something like this:
+        * rmmod ath10k_pci ath10k_core
+        *   Firmware crashes before hif_stop is called by the rmmod path
+        *   The crash handling logic calls hif_stop
+         *   Then rmmod gets around to calling hif_stop, but spins endlessly
+        *   in napi_synchronize.
+        *
+        *  I think one way this could happen is that ath10k_stop checks
+        *  for state != ATH10K_STATE_OFF, but STATE_RESTARTING is also
+        *  a possibility.  That might be how we can have hif_stop called twice
+        *  without a hif_start in between. --Ben
+        */
+       if (ar->napi_enabled) {
+               napi_synchronize(&ar->napi);
+               napi_disable(&ar->napi);
+               ar->napi_enabled = false;
+       }

Looking at the code and your comment the described fail case seems
legit. I would consider tuning ath10k_stop() so that it calls
ath10k_halt() only if ar->state != OFF & ar->state != RESTARTING
though. Or did you try already?

I did not try tuning ath10k_stop().  The code in this area is quite complex,
and in my opinion trying to keep the start/stop calls exactly matched without
individual 'has_started' flags seems ripe for bugs.

While your approach will probably work it won't prevent other non-NAPI
bad things from happening. Even if there's nothing today it might
creep up in the future. And you'd need to update ahb.c too.

I'll update ahb.c to match.

Thanks,
Ben



Michał



--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc  http://www.candelatech.com




[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux