On Thu, Feb 21, 2019 at 12:37:09PM +0100, Boris Brezillon wrote: > On Thu, 21 Feb 2019 12:21:38 +0100 > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote: > > > On Thu, Feb 21, 2019 at 11:36:46AM +0100, Boris Brezillon wrote: > > > On Thu, 21 Feb 2019 11:17:47 +0100 > > > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote: > > > > > > > Hi Boris, > > > > > > > > On Thu, Feb 21, 2019 at 09:10:55AM +0100, Boris Brezillon wrote: > > > > > Hi Sascha, > > > > > > > > > > On Wed, 20 Feb 2019 14:58:20 +0100 > > > > > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I have hardware here for which the normal way to turn off is just to cut > > > > > > the power. When the powercut happens during a NAND page write then we > > > > > > get more or less completely written pages during next boot. Very rarely > > > > > > it seems to happen that such a half written page with only very few > > > > > > flipped bits is erroneously detected as empty and written again which > > > > > > then results in ECC errors when reading the data. > > > > > > > > > > This should definitely be fixed, maybe by lowering the bitflip > > > > > threshold when doing the empty check. Do you know the ECC strength and > > > > > the number of bitflips you have when that problem occurs? > > > > > > > > The problem is that these half written pages do not seem to be very > > > > stable. It happens that the number of bitflips change with each read. > > > > I have seen pages which can be read sometimes and sometimes not. It > > > > really seems that half written pages must be avoided entirely. > > > > > > But when they are correctly read, do you know how many bitflips they > > > have? > > > > Yes, I know this number, > > Is it high? I mean, can we tweak the threshold to detect such cases? I haven't told you the number because I have seen pages with a single bitflip, pages with 3 bitflips, pages with 8 bitflips, anything you like. Funny enough, once I write to following pages up to the end of the block then the page which previously had only a few bitflips turns into a nearly random pattern. > > > but as said, it varies when reading the same > > page again. > > Yes, I can imagine, as cells were weakly programmed, their state might > change from one read to another. What I'm mainly interested in is the > state on the first read after the powercut. > > > > > > To be honest, I fear not all users will be able to be informed > > > that powercuts are about to happen, and we need a way to fix that for > > > everyone. > > > > If you want that I'm afraid we can't fix this on this level. You can't > > rely on the data when a powercut happens during write. It may look ok at > > first, but bitflips can develop later. I just found an interesting read > > here: > > https://www.datalight.com/blog/2017/03/08/enemy-of-nand-flash-memory/ > > > > If you want to fix it for all users you have to track somehow which > > pages you have written last and discard the data or copy it to another > > block. > > Did you have the problem that pages written with valid data turn into > uncorrectable pages on the second read attempt or is it just > erased/empty pages that turn out to be not so empty/erased and next > time you write to them you end up with uncorrectable errors? Both (seemingly) valid pages turned into uncorrectable pages, but as said above pages turned into garbage once I wrote other pages in the same block. I haven't made any tests with power cuts during erase. Power cuts during erase are probably not that critical because blocks without a valid erase marker will be erased again next time. Sascha -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/