Re: Prevent Nand page writes on Power failure

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Thu, 21 Feb 2019 12:37:09 +0100

On Thu, 21 Feb 2019 12:21:38 +0100
Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:

> On Thu, Feb 21, 2019 at 11:36:46AM +0100, Boris Brezillon wrote:
> > On Thu, 21 Feb 2019 11:17:47 +0100
> > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
> >   
> > > Hi Boris,
> > > 
> > > On Thu, Feb 21, 2019 at 09:10:55AM +0100, Boris Brezillon wrote:  
> > > > Hi Sascha,
> > > > 
> > > > On Wed, 20 Feb 2019 14:58:20 +0100
> > > > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
> > > >     
> > > > > Hi All,
> > > > > 
> > > > > I have hardware here for which the normal way to turn off is just to cut
> > > > > the power. When the powercut happens during a NAND page write then we
> > > > > get more or less completely written pages during next boot. Very rarely
> > > > > it seems to happen that such a half written page with only very few
> > > > > flipped bits is erroneously detected as empty and written again which
> > > > > then results in ECC errors when reading the data.    
> > > > 
> > > > This should definitely be fixed, maybe by lowering the bitflip
> > > > threshold when doing the empty check. Do you know the ECC strength and
> > > > the number of bitflips you have when that problem occurs?    
> > > 
> > > The problem is that these half written pages do not seem to be very
> > > stable. It happens that the number of bitflips change with each read.
> > > I have seen pages which can be read sometimes and sometimes not. It
> > > really seems that half written pages must be avoided entirely.  
> > 
> > But when they are correctly read, do you know how many bitflips they
> > have?  
> 
> Yes, I know this number,

Is it high? I mean, can we tweak the threshold to detect such cases?

> but as said, it varies when reading the same
> page again.

Yes, I can imagine, as cells were weakly programmed, their state might
change from one read to another. What I'm mainly interested in is the
state on the first read after the powercut.

> 
> > To be honest, I fear not all users will be able to be informed
> > that powercuts are about to happen, and we need a way to fix that for
> > everyone.  
> 
> If you want that I'm afraid we can't fix this on this level. You can't
> rely on the data when a powercut happens during write. It may look ok at
> first, but bitflips can develop later. I just found an interesting read
> here:
> https://www.datalight.com/blog/2017/03/08/enemy-of-nand-flash-memory/
> 
> If you want to fix it for all users you have to track somehow which
> pages you have written last and discard the data or copy it to another
> block.

Did you have the problem that pages written with valid data turn into
uncorrectable pages on the second read attempt or is it just
erased/empty pages that turn out to be not so empty/erased and next
time you write to them you end up with uncorrectable errors?

Other than that, yes, tracking which block was written last so that you
can move the data somewhere else in case of an unclean detach/unmount
is an option. Don't know if UBIFS has this information (UBI doesn't,
that's for sure).

> 
> Sascha
> 

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/