Hi Johan, On Fri, Mar 21, 2025 at 4:51 AM Johan Hovold <johan+linaro@xxxxxxxxxx> wrote: > > Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes > breaks and the log fills up with errors like: > > ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492 > ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484 > > which based on a quick look at the driver seemed to indicate some kind > of ring-buffer corruption. > > Miaoqing Pan tracked it down to the host seeing the updated destination > ring head pointer before the updated descriptor, and the error handling > for that in turn leaves the ring buffer in an inconsistent state. > > Add the missing memory barrier to make sure that the descriptor is read > after the head pointer to address the root cause of the corruption while > fixing up the error handling in case there are ever any (ordering) bugs > on the device side. > > Note that the READ_ONCE() are only needed to avoid compiler mischief in > case the ring-buffer helpers are ever inlined. > > Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623 > Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@xxxxxxxxxxx > Cc: Miaoqing Pan <quic_miaoqing@xxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # 5.6 > Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx> > --- > drivers/net/wireless/ath/ath11k/ce.c | 11 +++++------ > drivers/net/wireless/ath/ath11k/hal.c | 4 ++-- > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c > index e66e86bdec20..9d8efec46508 100644 > --- a/drivers/net/wireless/ath/ath11k/ce.c > +++ b/drivers/net/wireless/ath/ath11k/ce.c > @@ -393,11 +393,10 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe, > goto err; > } > > + /* Make sure descriptor is read after the head pointer. */ > + dma_rmb(); > + > *nbytes = ath11k_hal_ce_dst_status_get_length(desc); > - if (*nbytes == 0) { > - ret = -EIO; > - goto err; > - } > > *skb = pipe->dest_ring->skb[sw_index]; > pipe->dest_ring->skb[sw_index] = NULL; > @@ -430,8 +429,8 @@ static void ath11k_ce_recv_process_cb(struct ath11k_ce_pipe *pipe) > dma_unmap_single(ab->dev, ATH11K_SKB_RXCB(skb)->paddr, > max_nbytes, DMA_FROM_DEVICE); > > - if (unlikely(max_nbytes < nbytes)) { > - ath11k_warn(ab, "rxed more than expected (nbytes %d, max %d)", > + if (unlikely(max_nbytes < nbytes || nbytes == 0)) { > + ath11k_warn(ab, "unexpected rx length (nbytes %d, max %d)", > nbytes, max_nbytes); > dev_kfree_skb_any(skb); > continue; > diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c > index 61f4b6dd5380..8cb1505a5a0c 100644 > --- a/drivers/net/wireless/ath/ath11k/hal.c > +++ b/drivers/net/wireless/ath/ath11k/hal.c > @@ -599,7 +599,7 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf) > struct hal_ce_srng_dst_status_desc *desc = buf; > u32 len; > > - len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags); > + len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, READ_ONCE(desc->flags)); > desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN; > > return len; > @@ -829,7 +829,7 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng) > srng->u.src_ring.cached_tp = > *(volatile u32 *)srng->u.src_ring.tp_addr; > } else { > - srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr; > + srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); > > /* Try to prefetch the next descriptor in the ring */ > if (srng->flags & HAL_SRNG_FLAGS_CACHED) > -- > 2.48.1 > Like the others, I was seeing this message multiple times a day, and on a not so rare occasion it would also take the wifi device offline, with this patch applied, I do not see that at all here. Tested-by: Steev Klimaszewski <steev@xxxxxxxx>