On 28/09/18 20:29, Daniel Mack wrote: > On 28/9/2018 10:24 AM, Miquel Raynal wrote: >> Hi Daniel, >> >> Daniel Mack <daniel@xxxxxxxxxx> wrote on Fri, 28 Sep 2018 09:43:18 >> +0200: >> >>> Hi Chris, >>> >>> On 27/9/2018 11:55 PM, Chris Packham wrote: >>>> On 27/09/18 20:56, Boris Brezillon wrote: >>>>> On Thu, 27 Sep 2018 10:11:45 +0200 >>>>> Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: >>>>> >>>>>> Hi Daniel, >>>>>> >>>>>> Daniel Mack <daniel@xxxxxxxxxx> wrote on Thu, 27 Sep 2018 09:17:51 >>>>>> +0200: >>>>>> >>>>>>> At least on PXA3xx platforms, enabling RDY interrupts in the NDCR register >>>>>>> will only cause the IRQ to latch when the RDY lanes are changing, and not >>>>>>> in case they are already asserted. >>>>>>> >>>>>>> This means that if the controller finished the command in flight before >>>>>>> marvell_nfc_wait_op() is called, that function will wait for a change in >>>>>>> the bit that can't ever happen as it is already set. >>>>>>> >>>>>>> To address this race, check for the RDY bits after the IRQ was enabled, >>>>>>> and complete the completion immediately if the condition is already met. >>>>>>> >>>>>>> This fixes a bug that was observed with a NAND chip that holds a UBIFS >>>>>>> parition on which file system stress tests were executed. When >>>>>>> marvell_nfc_wait_op() reports an error, UBI/UBIFS will eventually mount >>>>>>> the filesystem read-only, reporting lots of warnings along the way. >>>>>>> >>>>>>> Fixes: 02f26ecf8c77 mtd: nand: add reworked Marvell NAND controller driver >>>>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>>>> Signed-off-by: Daniel Mack <daniel@xxxxxxxxxx> >>>>>>> --- >>>>>> >>>>>> Sorry I haven't had the time to check on my Armada, but you figured it >>>>>> out, and the fix looks good to me! >>>>>> >>>>>> Acked-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> >>>>>> >>>>>> Boris, do you plan to send another fixes PR of can I take it into >>>>>> the nand/next branch? >>>>> >>>>> Queued to mtd/master. >>>>> After fixing my R/B configuration I get a new error with this patch when >>>> running stress_1 from mtd-utils-2.0.0. I don't see this without the patch. >>> >>> That's strange. So your controller sets the RDY bits before it is ready? Could >>> you check whether only checking for NDSR_RDY(0) changes anything? Not sure >>> about the handling of NDSR_RDY(1) in this driver anyway ... >>> >> >> I suppose you mean this portion of code is not clear enough? >> >> >> u32 st = readl_relaxed(nfc->regs + NDSR); >> u32 ien = (~readl_relaxed(nfc->regs + NDCR)) & NDCR_ALL_INT; >> >> /* >> * RDY interrupt mask is one bit in NDCR while there are two status >> * bit in NDSR (RDY[cs0/cs2] and RDY[cs1/cs3]). >> */ >> if (st & NDSR_RDY(1)) >> st |= NDSR_RDY(0); >> >> if (!(st & ien)) >> return IRQ_NONE; >> >> >> -> st is the status in the NDSR register which has two RDY bits, one >> for each RDY line. >> -> ien is a view of the NDCR register which commands the interrupts >> and has one bit to enable both interrupt lines (let's call it >> NDCR_RDYM). >> >> The trick is that NDSR_RDY(0) is the same bit as NDCR_RDYM. >> So whenever NDSR_RDY(1) is set, we fake NDSR_RDY(0) to be set (by >> setting manually the bit in 'st') so that the (st & ien) comparison >> can be true if NDSR_RDY(1) is valid and RDY interrupts are enabled. >> >> With this in mind, I don't see why this >> >> + st = readl_relaxed(nfc->regs + NDSR); >> + if (st & (NDSR_RDY(0) | NDSR_RDY(1))) >> + complete(&nfc->complete); > > Yeah, me neither. Chris, are you absolutely sure this is the reason? I'm > asking because it took me several tries sometimes to trigger the bug, so > is there a chance that you see an error at all times, regardless of > whether my patch is applied? It seems pretty consistent. Without this patch there seems to be no problem. With this patch it triggers pretty much straight away. I can't discount that there might be something wrong with my dts (the R/B configuration was missing initially). I've also been able to run this on the DB-88F6820-AMC board with the same result (the dts for this is in the for-next branch of git://git.infradead.org/linux-mvebu.git). The really odd thing is the following seems to avoid the problem + st = readl_relaxed(nfc->regs + NDSR); + udelay(1000); + if (st & (NDSR_RDY(0) | NDSR_RDY(1))) + complete(&nfc->complete); Which is weird because the st value has already been read so the udelay should have no effect. On 28/09/18 19:43, Daniel Mack wrote: > > Also, does my .EALREADY approach (v1) make any difference? > The v1 of this patch doesn't show the problem.