Hi Johan, On Fri, Mar 21, 2025 at 9:55 AM Johan Hovold <johan+linaro@xxxxxxxxxx> wrote: > > Add the missing memory barrier to make sure that the REO dest ring > descriptor is read after the head pointer to avoid using stale data on > weakly ordered architectures like aarch64. > > This may fix the ring-buffer corruption worked around by commit > f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination > ring") by silently discarding data, and may possibly also address user > reported errors like: > > ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set > > Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41 > > Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices") > Cc: stable@xxxxxxxxxxxxxxx # 5.6 > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005 > Signed-off-by: Johan Hovold <johan+linaro@xxxxxxxxxx> > --- > > As I reported here: > > https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@xxxxxxxxxxxxxxxxxxxx/ > > the ath11k and ath12k appear to be missing a number of memory barriers > that are required on weakly ordered architectures like aarch64 to avoid > memory corruption issues. > > Here's a fix for one more such case which people already seem to be > hitting. > > Note that I've seen one "msdu_done" bit not set warning also with this > patch so whether it helps with that at all remains to be seen. I'm CCing > Jens and Steev that see these warnings frequently and that may be able > to help out with testing. > > Johan > > > drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++--------- > 1 file changed, 16 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c > index 029ecf51c9ef..0a57b337e4c6 100644 > --- a/drivers/net/wireless/ath/ath11k/dp_rx.c > +++ b/drivers/net/wireless/ath/ath11k/dp_rx.c > @@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, > struct ath11k *ar; > struct hal_reo_dest_ring *desc; > enum hal_reo_dest_ring_push_reason push_reason; > - u32 cookie; > + u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0; > int i; > > for (i = 0; i < MAX_RADIOS; i++) > @@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, > try_again: > ath11k_hal_srng_access_begin(ab, srng); > > + /* Make sure descriptor is read after the head pointer. */ > + dma_rmb(); > + > while (likely(desc = > (struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab, > srng))) { > cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE, > - desc->buf_addr_info.info1); > + READ_ONCE(desc->buf_addr_info.info1)); > buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID, > cookie); > mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie); > @@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, > > num_buffs_reaped[mac_id]++; > > + info0 = READ_ONCE(desc->info0); > push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON, > - desc->info0); > + info0); > if (unlikely(push_reason != > HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) { > dev_kfree_skb_any(msdu); > @@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id, > continue; > } > > - rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 & > + rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0); > + rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0); > + > + rxcb->is_first_msdu = !!(rx_msdu_info0 & > RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU); > - rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 & > + rxcb->is_last_msdu = !!(rx_msdu_info0 & > RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU); > - rxcb->is_continuation = !!(desc->rx_msdu_info.info0 & > + rxcb->is_continuation = !!(rx_msdu_info0 & > RX_MSDU_DESC_INFO0_MSDU_CONTINUATION); > rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID, > - desc->rx_mpdu_info.meta_data); > + READ_ONCE(desc->rx_mpdu_info.meta_data)); > rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM, > - desc->rx_mpdu_info.info0); > + rx_mpdu_info0); > rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM, > - desc->info0); > + info0); > > rxcb->mac_id = mac_id; > __skb_queue_tail(&msdu_list[mac_id], msdu); > -- > 2.48.1 > While the fix is definitely a fix, it does not seem to help with the `msdu_done bit in attention is not set` message as I have seen it 43 times in the last 12 hours. -- Steev