On 26/08/20 10:22 am, Chris Packham wrote: > On 25/08/20 7:22 pm, Heiner Kallweit wrote: > > <snip> >> I've been staring at spi-fsl-espi.c for while now and I think I've >>> identified a couple of deficiencies that may or may not be related >>> to my >>> issue. >>> >>> First I think the 'Transfer done but SPIE_DON isn't set' message can be >>> generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register. >>> We also write back to it to clear the current events. We re-read it in >>> fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can >>> naturally end up in that situation if we're doing a large read. >>> Consider >>> the messages for reading a block of data from a spi-nor chip >>> >>> tx = READ_OP + ADDR >>> rx = data >>> >>> We setup the transfer and pump out the tx_buf. The first interrupt goes >>> off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo, >>> clear ESPI_SPIE and wait for the next interrupt. The next interrupt >>> fires and this time we have ESPI_SPIE with just SPIM_RXT set. This >>> continues until we've received all the data and we finish with >>> ESPI_SPIE >>> having only SPIM_RXT set. When we re-read it we complain that SPIE_DON >>> isn't set. >>> >>> The other deficiency is that we only get an interrupt when the >>> amount of >>> data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than >>> FSL_ESPI_RXTHR left to be received we will never pull them out of >>> the fifo. >>> >> SPIM_DON will trigger an interrupt once the last characters have been >> transferred, and read the remaining characters from the FIFO. > > The T2080RM that I have says the following about the DON bit > > "Last character was transmitted. The last character was transmitted > and a new command can be written for the next frame." > > That does at least seem to fit with my assertion that it's all about > the TX direction. But the fact that it doesn't happen all the time > throws some doubt on it. > >> I think the reason I'm seeing some variability is because of how fast >>> (or slow) the interrupts get processed and how fast the spi-nor chip >>> can >>> fill the CPUs rx fifo. >>> >> To rule out timing issues at high bus frequencies I initially asked >> for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz >> or even less, then timing shouldn't be an issue. > Yes I've currently got spi-max-frequency = <1000000>; in my dts. I > would also expect a slower frequency would fit my "DON is for TX" > narrative. >> Last relevant functional changes have been done almost 4 years ago. >> And yours is the first such report I see. So question is what could >> be so >> special with your setup that it seems you're the only one being >> affected. >> The scenarios you describe are standard, therefore much more people >> should be affected in case of a driver bug. > Agreed. But even on my hardware (which may have a latent issue despite > being in the field for going on 5 years) the issue only triggers under > some fairly specific circumstances. >> You said that kernel config impacts how frequently the issue happens. >> Therefore question is what's the diff in kernel config, and how could >> the differences be related to SPI. > > It did seem to be somewhat random. Things like CONFIG_PREEMPT have an > impact but every time I found something that seemed to be having an > impact I've been able to disprove it. I actually think its about how > busy the system is which may or may not affect when we get round to > processing the interrupts. > > I have managed to get the 'Transfer done but SPIE_DON isn't set!' to > occur on the T2080RDB. > > I've had to add the following to expose the environment as a mtd > partition > > diff --git a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi > b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi > index ff87e67c70da..fbf95fc1fd68 100644 > --- a/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi > +++ b/arch/powerpc/boot/dts/fsl/t208xrdb.dtsi > @@ -116,6 +116,15 @@ flash@0 { > compatible = "micron,n25q512ax3", > "jedec,spi-nor"; > reg = <0>; > spi-max-frequency = <10000000>; /* > input clock */ > + > + partition@u-boot { > + reg = <0x00000000 0x00100000>; > + label = "u-boot"; > + }; > + partition@u-boot-env { > + reg = <0x00100000 0x00010000>; > + label = "u-boot-env"; > + }; > }; > }; > > And I'm using the following script to poke at the environment (warning > if anyone does try this and the bug hits it can render your u-boot > environment invalid). > > cat flash/fw_env_test.sh > #!/bin/sh > > generate_fw_env_config() > { > cat /proc/mtd | sed 's/[:"]//g' | while read dev size erasesize name > ; do > echo "$dev $size $erasesize $name" > [ "$name" = "u-boot-env" ] && echo "/dev/$dev 0x0000 0x2000 > $erasesize" >/flash/fw_env.config > done > } > > cycles=10 > [ $# -ge 1 ] && cycles=$1 > > generate_fw_env_config > > fw_printenv -c /flash/fw_env.config > > dmesg -c >/dev/null > x=0 > while [ $x -lt $cycles ]; do > fw_printenv -c /flash/fw_env.config >/dev/null || break > fw_setenv -c /flash/fw_env.config foo $RANDOM || break; > dmesg -c | grep -q fsl_espi && break; > let x=x+1 > done > > echo "Ran $x cycles" I've also now seen the RX FIFO not empty error on the T2080RDB fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 With my current workaround of emptying the RX FIFO. It seems survivable. Interestingly it only ever seems to be 1 extra byte in the RX FIFO and it seems to be after either a READ_SR or a READ_FSR. fsl_espi ffe110000.spi: tx 70 fsl_espi ffe110000.spi: rx 03 fsl_espi ffe110000.spi: Extra RX 00 fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 fsl_espi ffe110000.spi: tx 05 fsl_espi ffe110000.spi: rx 00 fsl_espi ffe110000.spi: Extra RX 03 fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 fsl_espi ffe110000.spi: tx 05 fsl_espi ffe110000.spi: rx 00 fsl_espi ffe110000.spi: Extra RX 03 From all the Micron SPI-NOR datasheets I've got access to it is possible to continually read the SR/FSR. But I've no idea why it happens some times and not others.