On Tue, 8 May 2018 23:12:59 +0200 Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote: > On Fri, 4 May 2018 11:58:35 +0200 > Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > Hi Boris, > > > > On Thu, 3 May 2018 09:49:08 +0200, Boris Brezillon > > <boris.brezillon@xxxxxxxxxxx> wrote: > > > > > It looks like the NAND_STATUS_FAIL bit is sticky after an ECC failure, > > > which leads all READ operations following the failing one to report > > > an ECC failure. Reset the chip to clear the NAND_STATUS_FAIL bit. > > > > > > Note that this behavior is not document in the datasheet, but resetting > > > the chip is the only solution we found to fix the problem. > > > > > > Fixes: 9748e1d87573 ("mtd: nand: add support for Micron on-die ECC") > > > Cc: <stable@xxxxxxxxxxxxxxx> > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxx> > > > Cc: Thomas Petazzoni <thomas.petazzoni@xxxxxxxxxxx> > > > Cc: Bean Huo <beanhuo@xxxxxxxxxx> > > > Cc: Peter Pan <peterpandong@xxxxxxxxxx> > > > --- > > > > Reviewed-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > > Queued to mtd/master. I'm dropping this patch because I'm no longer sure this is the correct way to fix bug. It seems that nand_set_features_op() is checking the FAIL bit while the ONFI spec clearly says that FAIL bit is only valid after a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op. That might explain why ->set_features() fails with -EIO after an ECC failure (apparently Micron only clears the FAIL bit when launching a PROGRAM, ERASE or READ-with-on-die-ECC-enabled op, not on a SET_FEATURES op).