On 30.08.2020 23:59, Chris Packham wrote: > > On 31/08/20 9:41 am, Heiner Kallweit wrote: >> On 30.08.2020 23:00, Chris Packham wrote: >>> On 31/08/20 12:30 am, Nicholas Piggin wrote: >>>> Excerpts from Chris Packham's message of August 28, 2020 8:07 am: >>> <snip> >>> >>>>>>>>> I've also now seen the RX FIFO not empty error on the T2080RDB >>>>>>>>> >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>>> >>>>>>>>> With my current workaround of emptying the RX FIFO. It seems >>>>>>>>> survivable. Interestingly it only ever seems to be 1 extra byte in the >>>>>>>>> RX FIFO and it seems to be after either a READ_SR or a READ_FSR. >>>>>>>>> >>>>>>>>> fsl_espi ffe110000.spi: tx 70 >>>>>>>>> fsl_espi ffe110000.spi: rx 03 >>>>>>>>> fsl_espi ffe110000.spi: Extra RX 00 >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>>> fsl_espi ffe110000.spi: tx 05 >>>>>>>>> fsl_espi ffe110000.spi: rx 00 >>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! >>>>>>>>> fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! >>>>>>>>> fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 >>>>>>>>> fsl_espi ffe110000.spi: tx 05 >>>>>>>>> fsl_espi ffe110000.spi: rx 00 >>>>>>>>> fsl_espi ffe110000.spi: Extra RX 03 >>>>>>>>> >>>>>>>>> From all the Micron SPI-NOR datasheets I've got access to it is >>>>>>>>> possible to continually read the SR/FSR. But I've no idea why it >>>>>>>>> happens some times and not others. >>>>>>>> So I think I've got a reproduction and I think I've bisected the problem >>>>>>>> to commit 3282a3da25bd ("powerpc/64: Implement soft interrupt replay in >>>>>>>> C"). My day is just finishing now so I haven't applied too much scrutiny >>>>>>>> to this result. Given the various rabbit holes I've been down on this >>>>>>>> issue already I'd take this information with a good degree of skepticism. >>>>>>>> >>>>>>> OK, so an easy test should be to re-test with a 5.4 kernel. >>>>>>> It doesn't have yet the change you're referring to, and the fsl-espi driver >>>>>>> is basically the same as in 5.7 (just two small changes in 5.7). >>>>>> There's 6cc0c16d82f88 and maybe also other interrupt related patches >>>>>> around this time that could affect book E, so it's good if that exact >>>>>> patch is confirmed. >>>>> My confirmation is basically that I can induce the issue in a 5.4 kernel >>>>> by cherry-picking 3282a3da25bd. I'm also able to "fix" the issue in >>>>> 5.9-rc2 by reverting that one commit. >>>>> >>>>> I both cases it's not exactly a clean cherry-pick/revert so I also >>>>> confirmed the bisection result by building at 3282a3da25bd (which sees >>>>> the issue) and the commit just before (which does not). >>>> Thanks for testing, that confirms it well. >>>> >>>> [snip patch] >>>> >>>>> I still saw the issue with this change applied. PPC_IRQ_SOFT_MASK_DEBUG >>>>> didn't report anything (either with or without the change above). >>>> Okay, it was a bit of a shot in the dark. I still can't see what >>>> else has changed. >>>> >>>> What would cause this, a lost interrupt? A spurious interrupt? Or >>>> higher interrupt latency? >>>> >>>> I don't think the patch should cause significantly worse latency, >>>> (it's supposed to be a bit better if anything because it doesn't set >>>> up the full interrupt frame). But it's possible. >>> My working theory is that the SPI_DON indication is all about the TX >>> direction an now that the interrupts are faster we're hitting an error >>> because there is still RX activity going on. Heiner disagrees with my >>> interpretation of the SPI_DON indication and the fact that it doesn't >>> happen every time does throw doubt on it. >>> >> It's right that the eSPI spec can be interpreted that SPI_DON refers to >> TX only. However this wouldn't really make sense, because also for RX >> we program the frame length, and therefore want to be notified once the >> full frame was received. Also practical experience shows that SPI_DON >> is set also after RX-only transfers. >> Typical SPI NOR use case is that you write read command + start address, >> followed by a longer read. If the TX-only interpretation would be right, >> we'd always end up with SPI_DON not being set. >> >>> I can't really explain the extra RX byte in the fifo. We know how many >>> bytes to expect and we pull that many from the fifo so it's not as if >>> we're missing an interrupt causing us to skip the last byte. I've been >>> looking for some kind of off-by-one calculation but again if it were >>> something like that it'd happen all the time. >>> >> Maybe it helps to know what value this extra byte in the FIFO has. Is it: >> - a duplicate of the last read byte >> - or the next byte (at <end address> + 1) >> - or a fixed value, e.g. always 0x00 or 0xff > > The values were up thread a bit but I'll repeat them here > > fsl_espi ffe110000.spi: tx 70 > fsl_espi ffe110000.spi: rx 03 > fsl_espi ffe110000.spi: Extra RX 00 > fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! > fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 > fsl_espi ffe110000.spi: tx 05 > fsl_espi ffe110000.spi: rx 00 > fsl_espi ffe110000.spi: Extra RX 03 > fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set! > fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't empty! > fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32 > fsl_espi ffe110000.spi: tx 05 > fsl_espi ffe110000.spi: rx 00 > fsl_espi ffe110000.spi: Extra RX 03 > > > The rx 00 Extra RX 03 is a bit concerning. I've only ever seen them with > either a READ_SR or a READ_FSR. Never a data read. > Just remembered something about SPIE_DON: Transfers are always full duplex, therefore in case of a read the chip sends dummy zero's. Having said that in case of a read SPIE_DON means that the last dummy zero was shifted out. READ_SR and READ_FSR are the shortest transfers, 1 byte out and 1 byte in. So the issue may have a dependency on the length of the transfer. However I see no good explanation so far. You can try adding a delay of a few miroseconds between the following to commands in fsl_espi_bufs(). fsl_espi_write_reg(espi, ESPI_SPIM, mask); /* Prevent filling the fifo from getting interrupted */ spin_lock_irq(&espi->lock); Maybe enabling interrupts and seeing the SPIE_DON interrupt are too close.