Re: Prevent Nand page writes on Power failure

Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> · Thu, 21 Feb 2019 14:27:01 +0100

On Thu, Feb 21, 2019 at 12:37:09PM +0100, Boris Brezillon wrote:
> On Thu, 21 Feb 2019 12:21:38 +0100
> Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
> 
> > On Thu, Feb 21, 2019 at 11:36:46AM +0100, Boris Brezillon wrote:
> > > On Thu, 21 Feb 2019 11:17:47 +0100
> > > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
> > >   
> > > > Hi Boris,
> > > > 
> > > > On Thu, Feb 21, 2019 at 09:10:55AM +0100, Boris Brezillon wrote:  
> > > > > Hi Sascha,
> > > > > 
> > > > > On Wed, 20 Feb 2019 14:58:20 +0100
> > > > > Sascha Hauer <s.hauer@xxxxxxxxxxxxxx> wrote:
> > > > >     
> > > > > > Hi All,
> > > > > > 
> > > > > > I have hardware here for which the normal way to turn off is just to cut
> > > > > > the power. When the powercut happens during a NAND page write then we
> > > > > > get more or less completely written pages during next boot. Very rarely
> > > > > > it seems to happen that such a half written page with only very few
> > > > > > flipped bits is erroneously detected as empty and written again which
> > > > > > then results in ECC errors when reading the data.    
> > > > > 
> > > > > This should definitely be fixed, maybe by lowering the bitflip
> > > > > threshold when doing the empty check. Do you know the ECC strength and
> > > > > the number of bitflips you have when that problem occurs?    
> > > > 
> > > > The problem is that these half written pages do not seem to be very
> > > > stable. It happens that the number of bitflips change with each read.
> > > > I have seen pages which can be read sometimes and sometimes not. It
> > > > really seems that half written pages must be avoided entirely.  
> > > 
> > > But when they are correctly read, do you know how many bitflips they
> > > have?  
> > 
> > Yes, I know this number,
> 
> Is it high? I mean, can we tweak the threshold to detect such cases?

I haven't told you the number because I have seen pages with a single
bitflip, pages with 3 bitflips, pages with 8 bitflips, anything you
like.
Funny enough, once I write to following pages up to the end of the block
then the page which previously had only a few bitflips turns into a
nearly random pattern.

> 
> > but as said, it varies when reading the same
> > page again.
> 
> Yes, I can imagine, as cells were weakly programmed, their state might
> change from one read to another. What I'm mainly interested in is the
> state on the first read after the powercut.
> 
> > 
> > > To be honest, I fear not all users will be able to be informed
> > > that powercuts are about to happen, and we need a way to fix that for
> > > everyone.  
> > 
> > If you want that I'm afraid we can't fix this on this level. You can't
> > rely on the data when a powercut happens during write. It may look ok at
> > first, but bitflips can develop later. I just found an interesting read
> > here:
> > https://www.datalight.com/blog/2017/03/08/enemy-of-nand-flash-memory/
> > 
> > If you want to fix it for all users you have to track somehow which
> > pages you have written last and discard the data or copy it to another
> > block.
> 
> Did you have the problem that pages written with valid data turn into
> uncorrectable pages on the second read attempt or is it just
> erased/empty pages that turn out to be not so empty/erased and next
> time you write to them you end up with uncorrectable errors?

Both (seemingly) valid pages turned into uncorrectable pages, but as
said above pages turned into garbage once I wrote other pages in the
same block.

I haven't made any tests with power cuts during erase. Power cuts during
erase are probably not that critical because blocks without a valid
erase marker will be erased again next time.

Sascha

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/