On 31/08/20 9:41 am, Heiner Kallweit wrote: > On 30.08.2020 23:00, Chris Packham wrote: >> On 31/08/20 12:30 am, Nicholas Piggin wrote: >>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am: >> <snip> >> >>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB >>>>>>>> >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>> >>>>>>>> With my current workaround of emptying the RX FIFO. It seems >>>>>>>> survivable. Interestingly it only ever seems to be 1 extra byte in the >>>>>>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR. >>>>>>>> >>>>>>>> fsl_espi ffe110000.spi: tx 70 >>>>>>>> fsl_espi ffe110000.spi: rx 03 >>>>>>>> fsl_espi ffe110000.spi: Extra RX 00 >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>> fsl_espi ffe110000.spi: tx 05 >>>>>>>> fsl_espi ffe110000.spi: rx 00 >>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 >>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>> fsl_espi ffe110000.spi: tx 05 >>>>>>>> fsl_espi ffe110000.spi: rx 00 >>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 >>>>>>>> >>>>>>>> From all the Micron SPI-NOR datasheets I've got access to it is >>>>>>>> possible to continually read the SR/FSR. But I've no idea why it >>>>>>>> happens some times and not others. >>>>>>> So I think I've got a reproduction and I think I've bisected the problem >>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in >>>>>>> C"). My day is just finishing now so I haven't applied too much scrutiny >>>>>>> to this result. Given the various rabbit holes I've been down on this >>>>>>> issue already I'd take this information with a good degree of skepticism. >>>>>>> >>>>>> OK, so an easy test should be to re-test with a 5.4 kernel. >>>>>> It doesn't have yet the change you're referring to, and the fsl-espi driver >>>>>> is basically the same as in 5.7 (just two small changes in 5.7). >>>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches >>>>> around this time that could affect book E, so it's good if that exact >>>>> patch is confirmed. >>>> My confirmation is basically that I can induce the issue in a 5.4 kernel >>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in >>>> 5.9-rc2 by reverting that one commit. >>>> >>>> I both cases it's not exactly a clean cherry-pick/revert so I also >>>> confirmed the bisection result by building at 3282a3da25bd (which sees >>>> the issue) and the commit just before (which does not). >>> Thanks for testing, that confirms it well. >>> >>> [snip patch] >>> >>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG >>>> didn't report anything (either with or without the change above). >>> Okay, it was a bit of a shot in the dark. I still can't see what >>> else has changed. >>> >>> What would cause this, a lost interrupt? A spurious interrupt? Or >>> higher interrupt latency? >>> >>> I don't think the patch should cause significantly worse latency, >>> (it's supposed to be a bit better if anything because it doesn't set >>> up the full interrupt frame). But it's possible. >> My working theory is that the SPI_DON indication is all about the TX >> direction an now that the interrupts are faster we're hitting an error >> because there is still RX activity going on. Heiner disagrees with my >> interpretation of the SPI_DON indication and the fact that it doesn't >> happen every time does throw doubt on it. >> > It's right that the eSPI spec can be interpreted that SPI_DON refers to > TX only. However this wouldn't really make sense, because also for RX > we program the frame length, and therefore want to be notified once the > full frame was received. Also practical experience shows that SPI_DON > is set also after RX-only transfers. > Typical SPI NOR use case is that you write read command + start address, > followed by a longer read. If the TX-only interpretation would be right, > we'd always end up with SPI_DON not being set. > >> I can't really explain the extra RX byte in the fifo. We know how many >> bytes to expect and we pull that many from the fifo so it's not as if >> we're missing an interrupt causing us to skip the last byte. I've been >> looking for some kind of off-by-one calculation but again if it were >> something like that it'd happen all the time. >> > Maybe it helps to know what value this extra byte in the FIFO has. Is it: > - a duplicate of the last read byte > - or the next byte (at <end address> + 1) > - or a fixed value, e.g. always 0x00 or 0xff The values were up thread a bit but I'll repeat them here fsl_espi ffe110000.spi: tx 70 fsl_espi ffe110000.spi: rx 03 fsl_espi ffe110000.spi: Extra RX 00 fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 fsl_espi ffe110000.spi: tx 05 fsl_espi ffe110000.spi: rx 00 fsl_espi ffe110000.spi: Extra RX 03 fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 fsl_espi ffe110000.spi: tx 05 fsl_espi ffe110000.spi: rx 00 fsl_espi ffe110000.spi: Extra RX 03 The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them with either a READ_SR or a READ_FSR. Never a data read.