On Sat, 1 Jul 2023 at 03:49, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote: > > As talked about in commit d66d24ac300c ("ath10k: Keep track of which > interrupts fired, don't poll them"), Hi Douglas, does this fix has a dependency on the above upstream commit d66d24ac300c, that you refer to? Asking because this patch landed on stable v5.4.y branch recently and now I see RCU stalls and lockups around "ath10k_snoc 18800000.wifi: failed to receive control response completion, polling.." message during ath10k_snoc initialization/bringup on DB845c. Here is the relevant log https://www.irccloud.com/pastebin/raw/NjKm3mLc, with DB845c rebooting into USB crash dump mode eventually. I wonder if commit d66d24ac300c need to be backported to v5.4.y as well? I tried cherry-picking it but ran into non-trivial conflicts, so didn't spend much time on it. Regards, Amit Pundir > if we access the copy engine > register at a bad time then ath10k can go boom. However, it's not > necessarily easy to know when it's safe to access them. > > The ChromeOS test labs saw a crash that looked like this at > shutdown/reboot time (on a chromeos-5.15 kernel, but likely the > problem could also reproduce upstream): > > Internal error: synchronous external abort: 96000010 [#1] PREEMPT SMP > ... > CPU: 4 PID: 6168 Comm: reboot Not tainted 5.15.111-lockdep-19350-g1d624fe6758f #1 010b9b233ab055c27c6dc88efb0be2f4e9e86f51 > Hardware name: Google Kingoftown (DT) > ... > pc : ath10k_snoc_read32+0x50/0x74 [ath10k_snoc] > lr : ath10k_snoc_read32+0x24/0x74 [ath10k_snoc] > ... > Call trace: > ath10k_snoc_read32+0x50/0x74 [ath10k_snoc ...] > ath10k_ce_disable_interrupt+0x190/0x65c [ath10k_core ...] > ath10k_ce_disable_interrupts+0x8c/0x120 [ath10k_core ...] > ath10k_snoc_hif_stop+0x78/0x660 [ath10k_snoc ...] > ath10k_core_stop+0x13c/0x1ec [ath10k_core ...] > ath10k_halt+0x398/0x5b0 [ath10k_core ...] > ath10k_stop+0xfc/0x1a8 [ath10k_core ...] > drv_stop+0x148/0x6b4 [mac80211 ...] > ieee80211_stop_device+0x70/0x80 [mac80211 ...] > ieee80211_do_stop+0x10d8/0x15b0 [mac80211 ...] > ieee80211_stop+0x144/0x1a0 [mac80211 ...] > __dev_close_many+0x1e8/0x2c0 > dev_close_many+0x198/0x33c > dev_close+0x140/0x210 > cfg80211_shutdown_all_interfaces+0xc8/0x1e0 [cfg80211 ...] > ieee80211_remove_interfaces+0x118/0x5c4 [mac80211 ...] > ieee80211_unregister_hw+0x64/0x1f4 [mac80211 ...] > ath10k_mac_unregister+0x4c/0xf0 [ath10k_core ...] > ath10k_core_unregister+0x80/0xb0 [ath10k_core ...] > ath10k_snoc_free_resources+0xb8/0x1ec [ath10k_snoc ...] > ath10k_snoc_shutdown+0x98/0xd0 [ath10k_snoc ...] > platform_shutdown+0x7c/0xa0 > device_shutdown+0x3e0/0x58c > kernel_restart_prepare+0x68/0xa0 > kernel_restart+0x28/0x7c > > Though there's no known way to reproduce the problem, it makes sense > that it would be the same issue where we're trying to access copy > engine registers when it's not allowed. > > Let's fix this by changing how we "disable" the interrupts. Instead of > tweaking the copy engine registers we'll just use disable_irq() and > enable_irq(). Then we'll configure the interrupts once at power up > time. > > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2.c10-00754-QCAHLSWMTPL-1 > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > > drivers/net/wireless/ath/ath10k/snoc.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c > index 26214c00cd0d..2c39bad7ebfb 100644 > --- a/drivers/net/wireless/ath/ath10k/snoc.c > +++ b/drivers/net/wireless/ath/ath10k/snoc.c > @@ -828,12 +828,20 @@ static void ath10k_snoc_hif_get_default_pipe(struct ath10k *ar, > > static inline void ath10k_snoc_irq_disable(struct ath10k *ar) > { > - ath10k_ce_disable_interrupts(ar); > + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); > + int id; > + > + for (id = 0; id < CE_COUNT_MAX; id++) > + disable_irq(ar_snoc->ce_irqs[id].irq_line); > } > > static inline void ath10k_snoc_irq_enable(struct ath10k *ar) > { > - ath10k_ce_enable_interrupts(ar); > + struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar); > + int id; > + > + for (id = 0; id < CE_COUNT_MAX; id++) > + enable_irq(ar_snoc->ce_irqs[id].irq_line); > } > > static void ath10k_snoc_rx_pipe_cleanup(struct ath10k_snoc_pipe *snoc_pipe) > @@ -1090,6 +1098,8 @@ static int ath10k_snoc_hif_power_up(struct ath10k *ar, > goto err_free_rri; > } > > + ath10k_ce_enable_interrupts(ar); > + > return 0; > > err_free_rri: > @@ -1253,8 +1263,8 @@ static int ath10k_snoc_request_irq(struct ath10k *ar) > > for (id = 0; id < CE_COUNT_MAX; id++) { > ret = request_irq(ar_snoc->ce_irqs[id].irq_line, > - ath10k_snoc_per_engine_handler, 0, > - ce_name[id], ar); > + ath10k_snoc_per_engine_handler, > + IRQF_NO_AUTOEN, ce_name[id], ar); > if (ret) { > ath10k_err(ar, > "failed to register IRQ handler for CE %d: %d\n", > -- > 2.41.0.255.g8b1d071c50-goog >