Re: [PATCH v2 3/3] mtd: rawnand: micron: Address the shallow erase issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Steve,

On Sun, 3 May 2020 09:10:15 -0700
Steve deRosier <derosier@xxxxxxxxx> wrote:

> > +static bool micron_nand_with_shallow_erase_issue(struct nand_chip *chip)
> > +{
> > +       /*
> > +        * The shallow erase issue has been observed with MT29F*G*A
> > +        * parts but Micron suspects that the issue can happen with
> > +        * almost all recent SLC but at such a low probability that it
> > +        * is almost invisible. Nevertheless, as we mitigate the
> > +        * performance penalty at runtime by following the number of
> > +        * written pages in a block before erasing it, we may want to
> > +        * enable this fix by default.
> > +        */
> > +       return nand_is_slc(chip);
> > +}  
> 
> 
> Whoa, let's hold our horses here!  "almost all recent" would imply
> that older SLCs aren't affected. And the likelyhood that Micron will
> fix newer parts is high - because why  would they leave in a major bug
> like that in the next mask? So, what you're saying is when someone
> goes to upgrade their older device's Linux they're going to take a
> major filesystem performance hit without knowing it (because
> realistically who reads 10,000s of patches before upgrading) when
> their chip doesn't need it.

I do agree with what you say, but... (see below).

> Because we're too lazy to get the list from Micron and code that ugliness?

Too lazy to get the list from Micron?! I can tell you we've tried hard
and they've always been reluctant to give us such a list, or a
discriminant to identify those buggy parts. They even tried to
convince us it was not a bug but a problem that's inherent to any NAND
flashes, not only theirs. They didn't go as far as claiming this was a
feature, but given the energy they spent to deny the problem I
honestly thought they would. So no, it's definitely not what you think,
and I was hoping that threatening them to merge that patch upstream
would force them to provide us this information. Looks like it never
happened.

Maybe those that had to debug those weird/hardly reproducible issues
should speak up, because that's no fun thing to spend weeks/months
chasing such bugs just to discover that Micron knows about the issue and
can provide a fix if you ask them.

> 
> We put this in and the resulting discussions for embedded systems
> designers for the next decade are going to be one of two things:
> * Oh, you want to use that SLC NAND from Micron? Well then don't use
> Linux because it performs crappy on Micron SLC NANDs.
> OR
> * Oh, you want to use Linux? Well, don't use a Micron SLC NAND then
> because they perform crappy on Linux.
> 
> Let's get a list of all chip that have this bug (and let's be clear -
> it's a bug, not a "quirk") and enable it for those chips specifically.
> Even better if there was something in the chipinfo itself that made it
> obvious which ones had the problem (because realistically it's
> probably specific to a particular geometry). In any case, it's in the
> best interest of Micron to identify to us exactly which chips have or
> are likely to have this issue and for us to be specific on which get
> assigned this quirk. It is probably listed in an errata app-note, and
> if not it should be.
> 
> Strong NAK to defaulting all Micron SLC NANDs to this - unless it
> truly is the case that _all_ Micron SLC NANDs in the past and in the
> future likely have this problem.
> 

I honestly don't have a good solution here. I guess we could blacklist
flashes one by one when people report weird issues, but when they
discover the problem is already too late, and they have plenty of units
in the wild.

Regards,

Boris

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux