Re: fsl_espi errors on v5.7.15

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26.08.2020 08:07, Chris Packham wrote:
> 
> On 26/08/20 1:48 pm, Chris Packham wrote:
>>
>> On 26/08/20 10:22 am, Chris Packham wrote:
>>> On 25/08/20 7:22 pm, Heiner Kallweit wrote:
>>>
>>> <snip>
>>>> I've been staring at spi-fsl-espi.c for while now and I think I've
>>>>> identified a couple of deficiencies that may or may not be related 
>>>>> to my
>>>>> issue.
>>>>>
>>>>> First I think the 'Transfer done but SPIE_DON isn't set' message 
>>>>> can be
>>>>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE 
>>>>> register.
>>>>> We also write back to it to clear the current events. We re-read it in
>>>>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can
>>>>> naturally end up in that situation if we're doing a large read. 
>>>>> Consider
>>>>> the messages for reading a block of data from a spi-nor chip
>>>>>
>>>>>    tx = READ_OP + ADDR
>>>>>    rx = data
>>>>>
>>>>> We setup the transfer and pump out the tx_buf. The first interrupt 
>>>>> goes
>>>>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo,
>>>>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt
>>>>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This
>>>>> continues until we've received all the data and we finish with 
>>>>> ESPI_SPIE
>>>>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON
>>>>> isn't set.
>>>>>
>>>>> The other deficiency is that we only get an interrupt when the 
>>>>> amount of
>>>>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than
>>>>> FSL_ESPI_RXTHR left to be received we will never pull them out of 
>>>>> the fifo.
>>>>>
>>>> SPIM_DON will trigger an interrupt once the last characters have been
>>>> transferred, and read the remaining characters from the FIFO.
>>>
>>> The T2080RM that I have says the following about the DON bit
>>>
>>> "Last character was transmitted. The last character was transmitted 
>>> and a new command can be written for the next frame."
>>>
>>> That does at least seem to fit with my assertion that it's all about 
>>> the TX direction. But the fact that it doesn't happen all the time 
>>> throws some doubt on it.
>>>
>>>> I think the reason I'm seeing some variability is because of how fast
>>>>> (or slow) the interrupts get processed and how fast the spi-nor 
>>>>> chip can
>>>>> fill the CPUs rx fifo.
>>>>>
>>>> To rule out timing issues at high bus frequencies I initially asked
>>>> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
>>>> or even less, then timing shouldn't be an issue.
>>> Yes I've currently got spi-max-frequency = <1000000>; in my dts. I 
>>> would also expect a slower frequency would fit my "DON is for TX" 
>>> narrative.
>>>> Last relevant functional changes have been done almost 4 years ago.
>>>> And yours is the first such report I see. So question is what could 
>>>> be so
>>>> special with your setup that it seems you're the only one being 
>>>> affected.
>>>> The scenarios you describe are standard, therefore much more people
>>>> should be affected in case of a driver bug.
>>> Agreed. But even on my hardware (which may have a latent issue 
>>> despite being in the field for going on 5 years) the issue only 
>>> triggers under some fairly specific circumstances.
>>>> You said that kernel config impacts how frequently the issue happens.
>>>> Therefore question is what's the diff in kernel config, and how could
>>>> the differences be related to SPI.
>>>
>>> It did seem to be somewhat random. Things like CONFIG_PREEMPT have an 
>>> impact but every time I found something that seemed to be having an 
>>> impact I've been able to disprove it. I actually think its about how 
>>> busy the system is which may or may not affect when we get round to 
>>> processing the interrupts.
>>>
>>> I have managed to get the 'Transfer done but SPIE_DON isn't set!' to 
>>> occur on the T2080RDB.
>>>
>>> I've had to add the following to expose the environment as a mtd 
>>> partition
>>>
>>> diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi 
>>> b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> index ff87e67c70da..fbf95fc1fd68 100644
>>> --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi
>>> @@ -116,6 +116,15 @@ flash@0 {
>>>                                 compatible = "micron,n25q512ax3", 
>>> "jedec,spi-nor";
>>>                                 reg = <0>;
>>>                                 spi-max-frequency = <10000000>; /* 
>>> input clock */
>>> +
>>> +                               partition@u-boot {
>>> +                                        reg = <0x00000000 0x00100000>;
>>> +                                        label = "u-boot";
>>> +                                };
>>> +                                partition@u-boot-env {
>>> +                                        reg = <0x00100000 0x00010000>;
>>> +                                        label = "u-boot-env";
>>> +                                };
>>>                         };
>>>                 };
>>>
>>> And I'm using the following script to poke at the environment 
>>> (warning if anyone does try this and the bug hits it can render your 
>>> u-boot environment invalid).
>>>
>>> cat flash/fw_env_test.sh
>>> #!/bin/sh
>>>
>>> generate_fw_env_config()
>>> {
>>>   cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize 
>>> name ; do
>>>      echo "$dev $size $erasesize $name"
>>>      [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x0000 0x2000 
>>> $erasesize" >/flash/fw_env.config
>>>   done
>>> }
>>>
>>> cycles=10
>>> [ $# -ge 1 ] && cycles=$1
>>>
>>> generate_fw_env_config
>>>
>>> fw_printenv -c /flash/fw_env.config
>>>
>>> dmesg -c >/dev/null
>>> x=0
>>> while [ $x -lt $cycles ]; do
>>>     fw_printenv -c /flash/fw_env.config >/dev/null || break
>>>     fw_setenv -c /flash/fw_env.config foo $RANDOM || break;
>>>     dmesg -c | grep -q fsl_espi && break;
>>>     let x=x+1
>>> done
>>>
>>> echo "Ran $x cycles"
>>
>> I've also now seen the RX FIFO not empty error on the T2080RDB
>>
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>>
>> With my current workaround of emptying the RX FIFO. It seems 
>> survivable. Interestingly it only ever seems to be 1 extra byte in the 
>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR.
>>
>> fsl_espi ffe110000.spi: tx 70
>> fsl_espi ffe110000.spi: rx 03
>> fsl_espi ffe110000.spi: Extra RX 00
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty!
>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
>> fsl_espi ffe110000.spi: tx 05
>> fsl_espi ffe110000.spi: rx 00
>> fsl_espi ffe110000.spi: Extra RX 03
>>
>> From all the Micron SPI-NOR datasheets I've got access to it is 
>> possible to continually read the SR/FSR. But I've no idea why it 
>> happens some times and not others.
> 
> So I think I've got a reproduction and I think I've bisected the problem 
> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in 
> C"). My day is just finishing now so I haven't applied too much scrutiny 
> to this result. Given the various rabbit holes I've been down on this 
> issue already I'd take this information with a good degree of skepticism.
> 
OK, so an easy test should be to re-test with a 5.4 kernel.
It doesn't have yet the change you're referring to, and the fsl-espi driver
is basically the same as in 5.7 (just two small changes in 5.7).

> Thanks,
> Chris
> 




[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux