Dne sobota, 13. maj 2023 ob 23:21:47 CEST je Larry Finger napisal(a): > On 5/13/23 15:13, Jernej Škrabec wrote: > > Dne sobota, 13. maj 2023 ob 21:55:51 CEST je Larry Finger napisal(a): > >> On 5/13/23 05:23, Jernej Škrabec wrote: > >>> Larry, > >>> > >>> Dne sreda, 10. maj 2023 ob 23:47:24 CEST je Larry Finger napisal(a): > >>>> On 5/10/23 16:07, Martin Blumenstingl wrote: > >>>>> On Wed, May 10, 2023 at 12:02 AM Larry Finger <Larry.Finger@xxxxxxxxxxxx> wrote: > >>>>> [...] > >>>>>> I added that patch to the driver. The user reports that he was able to do a ping > >>>>>> and an nslookup before it crashed with the following in the log: > >>>>> That's some positive news alongside the crash log: it seems that a > >>>>> part of the driver works! :-) > >>>>> > >>>>>> [ 8.700626] skbuff: skb_over_panic: text:ffff8000011924ac len:3341 put:3341 > >>>>>> head:ffff000003b3c000 data:ffff000003b3c040 tail:0xd4d end:0x2c0 dev:<NULL> > >>>>> [...] > >>>>>> Somehow skb->tail was greater than skb->end. Unfortunately I do not have access > >>>>>> to gdb to tell you what line corresponds to rtw_sdio_rx_skb+0x50 on the MangoPi > >>>>>> MQ Quad. > >>>>> I need to have a closer look at the pkg_offset and struct > >>>>> rtw_rx_pkt_stat which we receive. > >>>>> Recently my own MangoPI MQ-Quad arrived but I did not have the time to > >>>>> set it up yet. I'll try to do so during the weekend so I can debug > >>>>> this on my own. > >>>>> > >>>>> Please ping me next week in case I haven't provided any update until then. > >>>> > >>>> I have some test prints in to check for skb overrun. My initial indication is > >>>> that the problem was in the c2h branch of rtw_sdio_rx_skb(), but my next run > >>>> should verify that. My changes will do a pr_warn_once() when the problem > >>>> happens, and then drop the skb. > >>>> > >>>> My contact reported that he had one run of 3 minutes before the problem > >>>> happened, which is good news for most of the driver. > >>> > >>> I may have discovered something interesting. rtl8723ds vendor driver has > >>> following checks in RX data parsing code: > >>> https://github.com/lwfinger/rtl8723ds/blob/master/hal/rtl8723d/sdio/rtl8723ds_recv.c#L83-L99 > >>> > >>> Those checks are absent in rtl8822bs vendor driver, which was my original > >>> development platform for SDIO. This may indicate some kind of bug in FW > >>> and/or HW. > >>> > >>> I think that at least second check, which checks for exactly the case your > >>> client experience, can be easily added and tested. > >> > >> Thanks for this update. I added the following to the start of rtw_sdio_rx_skb(): > >> /* fix Hardware RX data error, drop whole recv_buffer */ > >> if (!(rtwdev->hal.rcr & BIT_ACRC32) && pkt_stat->crc_err) { > >> kfree_skb(skb); > >> return; > >> } > >> I think that duplicates the code in the vendor driver. > >> > >> I have not heard from my user as to whether it helps. My communications with him > >> are at https://github.com/lwfinger/rtl8723ds/issues/37. > > > > I had second part in mind (see attachment), but this is IMO only sanity check > > and it will mask the issue. At this point I'm not sure if this is something that > > can happen occasionally or is there additional bug in rtw88 code. I'll check > > rtl8723ds c2h code in greater detail. > > > > In any case, I would argue that all 3 patches in this thread are valid and > > should be submitted upstream. > > That patch looks good. I ill apply it to my rtw88 repo. I see that issue is still there. Next idea would be to check if RX aggregation is the problem. Just commenting out call to rtw_sdio_enable_rx_aggregation() is enough to disable it. Best regards, Jernej