Re: Possible bug on wilc1000 [Klartext]

<Ajay.Kathat@xxxxxxxxxxxxx> · Fri, 11 Feb 2022 11:23:31 +0000

On 11/02/22 15:59, Marek Vasut wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know 
> the content is safe
>
> On 2/11/22 11:15, Ajay.Kathat@xxxxxxxxxxxxx wrote:
>> On 10/02/22 21:55, Marek Vasut wrote:
>>>
>>> On 2/10/22 17:19, Ajay.Kathat@xxxxxxxxxxxxx wrote:
>>>
>>> Hi,
>>>
>>>> On 10/02/22 14:10, Christoph Niedermaier wrote:
>>>>> From: Ajay.Kathat@xxxxxxxxxxxxx [mailto:Ajay.Kathat@xxxxxxxxxxxxx]
>>>>> Sent: Wednesday, February 9, 2022 3:37 PM
>>>>>> On 08/02/22 21:56, Christoph Niedermaier wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I tested the wireless chip wilc1000 with the 5.16.5 Kernel and the
>>>>>>> firmware v15.4.1
>>>>>>> (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/atmel/wilc1000_wifi_firmware-1.bin) 
>>>>>>>
>>>>>>>
>>>>>>> on an i.MX6 QUAD with iperf3:
>>>>>>>
>>>>>>> # iperf3 -c IP_ADDR -P 16 -t 0
>>>>>>>
>>>>>>> After a while the test gets stuck and I got the following kernel
>>>>>>> messages:
>>>>>>> mmc0: Timeout waiting for hardware interrupt.
>>>>>>> mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>>>>>>> mmc0: sdhci: Sys addr:  0x138f0200 | Version: 0x00000002
>>>>>>> mmc0: sdhci: Blk size:  0x00000158 | Blk cnt: 0x00000001
>>>>>>> mmc0: sdhci: Argument:  0x14000158 | Trn mode: 0x00000013
>>>>>>> mmc0: sdhci: Present:   0x01d88a0a | Host ctl: 0x00000013
>>>>>>> mmc0: sdhci: Power:     0x00000002 | Blk gap: 0x00000080
>>>>>>> mmc0: sdhci: Wake-up:   0x00000008 | Clock: 0x0000009f
>>>>>>> mmc0: sdhci: Timeout:   0x0000008f | Int stat: 0x00000000
>>>>>>> mmc0: sdhci: Int enab:  0x107f100b | Sig enab: 0x107f100b
>>>>>>> mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000003
>>>>>>> mmc0: sdhci: Caps:      0x07eb0000 | Caps_1: 0x0000a000
>>>>>>> mmc0: sdhci: Cmd:       0x0000353a | Max curr: 0x00ffffff
>>>>>>> mmc0: sdhci: Resp[0]:   0x00001000 | Resp[1]: 0x00000000
>>>>>>> mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]: 0x00000000
>>>>>>> mmc0: sdhci: Host ctl2: 0x00000000
>>>>>>> mmc0: sdhci: ADMA Err:  0x00000007 | ADMA Ptr: 0x4c041200
>>>>>>> mmc0: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
>>>>>>> =========
>>>>>>> mmc0: sdhci-esdhc-imx: cmd debug status:  0x2100
>>>>>>> mmc0: sdhci-esdhc-imx: data debug status:  0x2200
>>>>>>> mmc0: sdhci-esdhc-imx: trans debug status:  0x2300
>>>>>>> mmc0: sdhci-esdhc-imx: dma debug status:  0x2402
>>>>>>> mmc0: sdhci-esdhc-imx: adma debug status:  0x25b4
>>>>>>> mmc0: sdhci-esdhc-imx: fifo debug status:  0x2610
>>>>>>> mmc0: sdhci-esdhc-imx: async fifo debug status: 0x2751
>>>>>>> mmc0: sdhci: ============================================
>>>>>>> wilc1000_sdio mmc0:0001:1: wilc_sdio_cmd53..failed, err(-110)
>>>>>>> wilc1000_sdio mmc0:0001:1: Failed cmd53 [0], bytes read...
>>>>>>>
>>>>>>> I tried to reduce the clock speed to 20MHz in the devicetree with
>>>>>>> max-frequency = <20000000>;
>>>>>>> but the problem then also occurs.
>>>>>>>
>>>>>>> Is this a possible bug?
>>>>>>>
>>>>>>>
>>>>> Hi Ajay,
>>>>> Thanks for the answer.
>>>>>
>>>>>> The bus error seems to be specific to the host during the SDIO
>>>>>> transfer.
>>>>>> How long does it take to reproduce it? Does the issue also happen
>>>>>> without "-P 16" iPerf3 option?
>>>>> It takes about 10s (something a bit longer) till I got this kernel
>>>>> error
>>>>> messages and it doesn't matter if I use it with "-P 16" or without.
>>>>
>>>>
>>>> I did not observe the issue with my setup(SAMA5D4 XPLAINED + WILC1000
>>>> SDIO) when tested iPerf for a longer duration(~1000sec). I suspect the
>>>> issue could be related to the SDHCI host controller.
>>>> Try to debug the host controller side for the possible cause of 
>>>> timeout.
>>>
>>> It seems the timeout happens because the card fails to respond to SDIO
>>> command 53, right ?
>>>
>>
>> Yes, the timeout could be for any reason like either the CMD53 has not
>> reached to chip or response not received correctly at host end.
>
> The problem happens seconds or tens of seconds into the test, so there
> must've been CMD53 which reached the card before the problem occurred,
> and there must have been a lot of those CMD53 before the problem
> happened too, since CMD53 seems to be some data transfer CMD ?
>
>>> Is there some error logging/tracing functionality in the WILC1000
>>> firmware which can provide further information why the card did not
>>> respond ?
>>
>>
>> WILC1000 SD module has UART serial debug port for firmware logs but I
>> don't think it would be useful here because it needs to be debug/probe
>> at SDIO bus level.
>
> Is there some other kind of logging which can tell us more details on
> where to look for this problem ?
>
> Maybe we can try monitoring the SDIO traffic with ftrace ?
>
> Any other options, short of taking the hardware apart ?
>
>>> Could it be the card suffered some sort of FIFO overflow ? The MX6Q 
>>> is a
>>> bit more performant than the CA7 (I think?) SAMA5D4, so maybe that 
>>> plays
>>> some role ?
>>
>> As I understand, the issue is observed with basic iPerf testing(less
>> throughput) so not sure if the host performance will have such an
>> impact. IIRC few of the customers are using the same host(i.MX6) though
>> I am not sure if it's over SPI or SDIO bus. Till now, I have not come
>> across such limitations with the specific host.
>
> That iperf -P 16 hammers the chip with a lot of short packets, the
> problem does not occur during iperf3 -P 1 run or UDP iperf3 run (that's
> the one with low traffic). Here the interface is saturated, that's why I
> speculate some sort of FIFO overrun is happening.
>

But earlier it was mentioned that the problem doesn't matter with or 
without "-P" option. So it seems the issue happens during stress test.

> I have also noticed there are some wilc1000 downstream drivers with huge
> stacks of patches, but I never really figured out whether those are
> still relevant or whether the upstream wilc1000 driver is perfectly
> fine. I would like to believe it is the later, is it ?

Yes, wilc1000 mainline driver has all the required changes except few 
minor bug fixes and new features but none related to this scenario.

Regards,
Ajay