Dne sobota, 13. maj 2023 ob 21:55:51 CEST je Larry Finger napisal(a): > On 5/13/23 05:23, Jernej Škrabec wrote: > > Larry, > > > > Dne sreda, 10. maj 2023 ob 23:47:24 CEST je Larry Finger napisal(a): > >> On 5/10/23 16:07, Martin Blumenstingl wrote: > >>> On Wed, May 10, 2023 at 12:02 AM Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote: > >>> [...] > >>>> I added that patch to the driver. The user reports that he was able to do a ping > >>>> and an nslookup before it crashed with the following in the log: > >>> That's some positive news alongside the crash log: it seems that a > >>> part of the driver works! :-) > >>> > >>>> [ 8.700626] skbuff: skb_over_panic: text:ffff8000011924ac len:3341 put:3341 > >>>> head:ffff000003b3c000 data:ffff000003b3c040 tail:0xd4d end:0x2c0 dev:<NULL> > >>> [...] > >>>> Somehow skb->tail was greater than skb->end. Unfortunately I do not have access > >>>> to gdb to tell you what line corresponds to rtw_sdio_rx_skb+0x50 on the MangoPi > >>>> MQ Quad. > >>> I need to have a closer look at the pkg_offset and struct > >>> rtw_rx_pkt_stat which we receive. > >>> Recently my own MangoPI MQ-Quad arrived but I did not have the time to > >>> set it up yet. I'll try to do so during the weekend so I can debug > >>> this on my own. > >>> > >>> Please ping me next week in case I haven't provided any update until then. > >> > >> I have some test prints in to check for skb overrun. My initial indication is > >> that the problem was in the c2h branch of rtw_sdio_rx_skb(), but my next run > >> should verify that. My changes will do a pr_warn_once() when the problem > >> happens, and then drop the skb. > >> > >> My contact reported that he had one run of 3 minutes before the problem > >> happened, which is good news for most of the driver. > > > > I may have discovered something interesting. rtl8723ds vendor driver has > > following checks in RX data parsing code: > > https://github.com/lwfinger/rtl8723ds/blob/master/hal/rtl8723d/sdio/rtl8723ds_recv.c#L83-L99 > > > > Those checks are absent in rtl8822bs vendor driver, which was my original > > development platform for SDIO. This may indicate some kind of bug in FW > > and/or HW. > > > > I think that at least second check, which checks for exactly the case your > > client experience, can be easily added and tested. > > Thanks for this update. I added the following to the start of rtw_sdio_rx_skb(): > /* fix Hardware RX data error, drop whole recv_buffer */ > if (!(rtwdev->hal.rcr & BIT_ACRC32) && pkt_stat->crc_err) { > kfree_skb(skb); > return; > } > I think that duplicates the code in the vendor driver. > > I have not heard from my user as to whether it helps. My communications with him > are at https://github.com/lwfinger/rtl8723ds/issues/37. I had second part in mind (see attachment), but this is IMO only sanity check and it will mask the issue. At this point I'm not sure if this is something that can happen occasionally or is there additional bug in rtw88 code. I'll check rtl8723ds c2h code in greater detail. In any case, I would argue that all 3 patches in this thread are valid and should be submitted upstream. Best regards, Jernej
diff --git a/drivers/net/wireless/realtek/rtw88/sdio.c b/drivers/net/wireless/realtek/rtw88/sdio.c index af0459a79899..d69ce76ad0f4 100644 --- a/drivers/net/wireless/realtek/rtw88/sdio.c +++ b/drivers/net/wireless/realtek/rtw88/sdio.c @@ -975,7 +975,13 @@ static void rtw_sdio_rxfifo_recv(struct rtw_dev *rtwdev, u32 rx_len) curr_pkt_len = ALIGN(pkt_offset + pkt_stat.pkt_len, RTW_SDIO_DATA_PTR_ALIGN); - if ((curr_pkt_len + pkt_desc_sz) >= rx_len) { + if ((curr_pkt_len + pkt_desc_sz) > rx_len) { + dev_warn(rtwdev->dev, "Invalid RX packet size!"); + dev_kfree_skb_any(skb); + return; + } + + if ((curr_pkt_len + pkt_desc_sz) == rx_len) { /* Use the original skb (with it's adjusted offset) * when processing the last (or even the only) entry to * have it's memory freed automatically.