Lukas reports skb_over_panic errors on his Banana Pi BPI-CM4 which comes with an Amlogic A311D (G12B) SoC and a RTL8822CS SDIO wifi/Bluetooth combo card. The error he observed is identical to what has been fixed in commit e967229ead0e ("wifi: rtw88: sdio: Check the HISR RX_REQUEST bit in rtw_sdio_rx_isr()") but that commit didn't fix Lukas' problem. Lukas found that disabling or limiting RX aggregation works around the problem for some time (but does not fully fix it). In the following discussion a few key topics have been discussed which have an impact on this problem: - The Amlogic A311D (G12B) SoC has a hardware bug in the SDIO controller which prevents DMA transfers. Instead all transfers need to go through the controller SRAM which limits transfers to 1536 bytes - rtw88 chips don't split incoming (RX) packets, so if a big packet is received this is forwarded to the host in it's original form - rtw88 chips can do RX aggregation, meaning more multiple incoming packets can be pulled by the host from the card with one MMC/SDIO transfer. This Depends on settings in the REG_RXDMA_AGG_PG_TH register (BIT_RXDMA_AGG_PG_TH limits the number of packets that will be aggregated, BIT_DMA_AGG_TO_V1 configures a timeout for aggregation and BIT_EN_PRE_CALC makes the chip honor the limits more effectively) Use multiple consecutive reads in rtw_sdio_read_port() and limit the number of bytes which are copied by the host from the card in one MMC/SDIO transfer. This allows receiving a buffer that's larger than the hosts max_req_size (number of bytes which can be transferred in one MMC/SDIO transfer). As a result of this the skb_over_panic error is gone as the rtw88 driver is now able to receive more than 1536 bytes from the card (either because the incoming packet is larger than that or because multiple packets have been aggregated). In case of an receive errors (-EILSEQ has been observed by Lukas) we need to drain the remaining data from the card's buffer, otherwise the card will return corrupt data for the next rtw_sdio_read_port() call. Fixes: 65371a3f14e7 ("wifi: rtw88: sdio: Add HCI implementation for SDIO based chipsets") Reported-by: Lukas F. Hartmann <lukas@xxxxxxxxx> Closes: https://lore.kernel.org/linux-wireless/CAFBinCBaXtebixKbjkWKW_WXc5k=NdGNaGUjVE8NCPNxOhsb2g@xxxxxxxxxxxxxx/ Suggested-by: Ping-Ke Shih <pkshih@xxxxxxxxxxx> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@xxxxxxxxxxxxxx> --- Changes since v2 at [2]: - Don't initialize err to zero as that intiial value is never used. Thanks Ping-Ke for spotting this! - Add a comment explaning why we need to continue reading but still have to return an error to the caller of rtw_sdio_read_port() Changes since v1 at [0]: - We need to read all bytes if we split the transaction into multiple smaller reads. This is even the case when one of N reads reports an error. Otherwise the next read port call will return garbage (partially containing zeros, ...). A similar-ish approach can be found in the vendor driver, see [1] (specifically the call to sdio_recv_and_drop()) - Update the patch description accordingly With a preliminary version of this updated patch Lukas reported off- list: "i've been using this laptop for almost 3 hours with heavy wifi usage and so far no problems" [0] https://lore.kernel.org/lkml/169089906853.212423.17095176293160428610.kvalo@xxxxxxxxxx/T/ [1] https://github.com/chewitt/RTL8822CS/blob/ad1391e219b59314485739a499fb442d5bbc069e/hal/rtl8822c/sdio/rtl8822cs_io.c#L468-L477 [2] https://lore.kernel.org/linux-wireless/20230806181656.2072792-1-martin.blumenstingl@xxxxxxxxxxxxxx/ drivers/net/wireless/realtek/rtw88/sdio.c | 35 ++++++++++++++++++----- 1 file changed, 28 insertions(+), 7 deletions(-) diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c index 2c1fb2dabd40..0cae5746f540 100644 --- a/drivers/net/wireless/realtek/rtw88/sdio.c +++ b/drivers/net/wireless/realtek/rtw88/sdio.c @@ -500,19 +500,40 @@ static u32 rtw_sdio_get_tx_addr(struct rtw_dev *rtwdev, size_t size, static int rtw_sdio_read_port(struct rtw_dev *rtwdev, u8 *buf, size_t count) { struct rtw_sdio *rtwsdio = (struct rtw_sdio *)rtwdev->priv; + struct mmc_host *host = rtwsdio->sdio_func->card->host; bool bus_claim = rtw_sdio_bus_claim_needed(rtwsdio); u32 rxaddr = rtwsdio->rx_addr++; - int ret; + int ret = 0, err; + size_t bytes; if (bus_claim) sdio_claim_host(rtwsdio->sdio_func); - ret = sdio_memcpy_fromio(rtwsdio->sdio_func, buf, - RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr), count); - if (ret) - rtw_warn(rtwdev, - "Failed to read %zu byte(s) from SDIO port 0x%08x", - count, rxaddr); + while (count > 0) { + bytes = min_t(size_t, host->max_req_size, count); + + err = sdio_memcpy_fromio(rtwsdio->sdio_func, buf, + RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr), + bytes); + if (err) { + rtw_warn(rtwdev, + "Failed to read %zu byte(s) from SDIO port 0x%08x: %d", + bytes, rxaddr, err); + + /* Signal to the caller that reading did not work and + * that the data in the buffer is short/corrupted. + */ + ret = err; + + /* Don't stop here - instead drain the remaining data + * from the card's buffer, else the card will return + * corrupt data for the next rtw_sdio_read_port() call. + */ + } + + count -= bytes; + buf += bytes; + } if (bus_claim) sdio_release_host(rtwsdio->sdio_func); -- 2.42.1