Search Linux Wireless

[PATCH] ath9k: Add RX inactivity detection and reset chip when it occurs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Some ath9k chips can, seemingly at random, end up in a state which can
be described as "deaf". No or nearly no interrupts are generated anymore
for incoming packets. Existing links either break down after a while and
new links will not be established.

The circumstances leading to this "deafness" is still unclear, but some
particular chips (especially 2-stream 11n SoCs, but also others) can go
'deaf' when running AP or mesh (or both) after some time. It's probably
a hardware issue, and doing a channel scan to trigger a chip
reset (which one normally can't do on an AP interface) recovers the
hardware.

The only way the driver can detect this state, is by detecting if there
has been no RX activity for a while. In this case we can proactively
reset the chip (which only takes a small number of milliseconds, so
shouldn't interrupt things too much if it has been idle for several
seconds), which functions as a workaround.

OpenWrt, and various derivatives, have been carrying versions of this
workaround for years, that were never upstreamed. One version[0],
written by Felix Fietkau, used a simple counter and only reset if there
was precisely zero RX activity for a long period of time. This had the
problem that in some cases a small number of interrupts would appear
even if the device was otherwise not responsive. For this reason,
another version[1], written by Simon Wunderlich and Sven Eckelmann, used
a time-based approach to calculate the average number of RX interrupts
over a longer (four-second) interval, and reset the chip when seeing
less than one interrupt per second over this period. However, that
version relied on debugfs counters to keep track of the number of
interrupts, which means it didn't work at all if debugfs was not
enabled.

This patch unifies the two versions: it uses the same approach as Felix'
patch to count the number of RX handler invocations, but uses the same
time-based windowing approach as Simon and Sven's patch to still handle
the case where occasional interrupts appear but the device is otherwise
deaf.

Since this is based on ideas by all three people, but not actually
directly derived from any of the patches, I'm including Suggested-by
tags from Simon, Sven and Felix below, which should hopefully serve as
proper credit.

[0] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@xxxxxxxx/
[1] https://patchwork.kernel.org/project/linux-wireless/patch/20161117083614.19188-2-sven.eckelmann@xxxxxxxxxxxxx/

Suggested-by: Simon Wunderlich <sw@xxxxxxxxxxxxxxxxxx>
Suggested-by: Sven Eckelmann <se@xxxxxxxxxxxxxxxxxx>
Suggested-by: Felix Fietkau <nbd@xxxxxxxx>
Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxx>
---
 drivers/net/wireless/ath/ath9k/ath9k.h |  2 ++
 drivers/net/wireless/ath/ath9k/debug.c |  1 +
 drivers/net/wireless/ath/ath9k/debug.h |  1 +
 drivers/net/wireless/ath/ath9k/link.c  | 33 +++++++++++++++++++++++++++++++--
 drivers/net/wireless/ath/ath9k/main.c  |  1 +
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 29ca65a732a66c9452b19ad184f513f92e4c8c2c..bcfc8df0efe5b2c5510230e7f59592d51df8365e 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -1018,6 +1018,8 @@ struct ath_softc {
 
 	u8 gtt_cnt;
 	u32 intrstatus;
+	u32 rx_active_check_time;
+	u32 rx_active_count;
 	u16 ps_flags; /* PS_* */
 	bool ps_enabled;
 	bool ps_idle;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 51abc470125b3cc4e773f5272e2394c8f790a32e..a1e3bcf796f27fd58f5ab307a198927274f8f321 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -750,6 +750,7 @@ static int read_file_reset(struct seq_file *file, void *data)
 		[RESET_TYPE_CALIBRATION] = "Calibration error",
 		[RESET_TX_DMA_ERROR] = "Tx DMA stop error",
 		[RESET_RX_DMA_ERROR] = "Rx DMA stop error",
+		[RESET_TYPE_RX_INACTIVE] = "Rx path inactive",
 	};
 	int i;
 
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index 389459c04d14a5215783b916519d05419b970c20..cb3e75969875afed282b09ff2458dd8ca286d5b1 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -53,6 +53,7 @@ enum ath_reset_type {
 	RESET_TYPE_CALIBRATION,
 	RESET_TX_DMA_ERROR,
 	RESET_RX_DMA_ERROR,
+	RESET_TYPE_RX_INACTIVE,
 	__RESET_TYPE_MAX
 };
 
diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index d1e5767aab3cbc2ee91018edadf89a223cf1d759..d078a59d7d3cdbe623b507d29cc974b801f2deaf 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -50,7 +50,36 @@ static bool ath_tx_complete_check(struct ath_softc *sc)
 		"tx hung, resetting the chip\n");
 	ath9k_queue_reset(sc, RESET_TYPE_TX_HANG);
 	return false;
+}
+
+#define RX_INACTIVE_CHECK_INTERVAL (4 * MSEC_PER_SEC)
+
+static bool ath_hw_rx_inactive_check(struct ath_softc *sc)
+{
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	u32 interval, count;
+
+	interval = jiffies_to_msecs(jiffies - sc->rx_active_check_time);
+	count = sc->rx_active_count;
+
+	if (interval < RX_INACTIVE_CHECK_INTERVAL)
+		return true; /* too soon to check */
 
+	sc->rx_active_count = 0;
+	sc->rx_active_check_time = jiffies;
+
+	/* Need at least one interrupt per second, and we should only react if
+	 * we are within a factor two of the expected interval
+	 */
+	if (interval > RX_INACTIVE_CHECK_INTERVAL * 2 ||
+	    count >= interval / MSEC_PER_SEC)
+		return true;
+
+	ath_dbg(common, RESET,
+		"RX inactivity detected. Schedule chip reset\n");
+	ath9k_queue_reset(sc, RESET_TYPE_RX_INACTIVE);
+
+	return false;
 }
 
 void ath_hw_check_work(struct work_struct *work)
@@ -58,8 +87,8 @@ void ath_hw_check_work(struct work_struct *work)
 	struct ath_softc *sc = container_of(work, struct ath_softc,
 					    hw_check_work.work);
 
-	if (!ath_hw_check(sc) ||
-	    !ath_tx_complete_check(sc))
+	if (!ath_hw_check(sc) || !ath_tx_complete_check(sc) ||
+	    !ath_hw_rx_inactive_check(sc))
 		return;
 
 	ieee80211_queue_delayed_work(sc->hw, &sc->hw_check_work,
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index b92c89dad8deac7281e0165c9e1a566ba99ece65..998f717a1a86eee539058f6f4d9318dd459fc8dd 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -453,6 +453,7 @@ void ath9k_tasklet(struct tasklet_struct *t)
 			ath_rx_tasklet(sc, 0, true);
 
 		ath_rx_tasklet(sc, 0, false);
+		sc->rx_active_count++;
 	}
 
 	if (status & ATH9K_INT_TX) {

---
base-commit: c33f9c2728d0ccc7472e6239346c0fb3de556e0f
change-id: 20241105-ath9k-deaf-detection-0e26bb3243a6





[Index of Archives]     [Linux Host AP]     [ATH6KL]     [Linux Wireless Personal Area Network]     [Linux Bluetooth]     [Wireless Regulations]     [Linux Netdev]     [Kernel Newbies]     [Linux Kernel]     [IDE]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Hiking]     [MIPS Linux]     [ARM Linux]     [Linux RAID]

  Powered by Linux