Re: [RFC PATCH] UBI fixable bit-flip issue

Boris Brezillon <boris.brezillon@xxxxxxxxxxx> · Fri, 17 Aug 2018 16:53:22 +0200

On Sat, 18 Aug 2018 00:33:25 +1000
Mark Spieth <mspieth@xxxxxxxxxxxxxxxxx> wrote:

> >> I hope this description is clear enough.  
> > Well, I think selecting the bitflip threshold properly is really
> > important, simply because some NANDs (including SLC NANDs) are showing
> > bitflips even on blocks that have a low EC. Check the NAND ECC
> > requirements, and if it's something like 8bit/512bytes, I guess that's
> > more or less expected (it all depends on how many bitflips you have in
> > the faulty block). It's less likely on NANDs requiring 1bit/512bytes
> > ECC, and if that happens on such NANDs, you may have a problem in the
> > controller driver.  
> 4 bits ECC per 512 bytes, from memory 28 bytes in OOB, using software 
> ECC in the MTD driver.
> As I said, I believe the better threshold is hiding the root cause. It 
> is only a band-aid.

What you describe will anyway happen sooner or later: if you're using
almost all LEBs, and the remaining free ones are all impacted by the
correctable bit-flip issue you'll have to use them anyway. So, yes,
this is a band-aid, just like your solution is just improving things
but not really solving the issue. This being said, if the blocks
really show too many bitflips, they should be marked bad at some point,
because during the scrubbing process we do write a pattern and check
that we can read it back. I'll have to double check, but I think we're
also checking for EUCLEAN and mark the block bad when that happens.

Another option would be to order free blocks, not only by
descending erase counters, but also by number of times the upper layer
reported EUCLEAN on them. That would imply adding a new field to the EC
header, but I think both Richard and I are open to discussing that.

Regards,

Boris

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/