Re: [PATCH] mtd: rawnand: micron: handle "ecc off" devices correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lucas, Marco,

Lucas Stach <l.stach@xxxxxxxxxxxxxx> wrote on Fri, 26 Jul 2019 10:54:11
+0200:

> Hi Miguel,
> 
> Am Freitag, den 26.07.2019, 10:28 +0200 schrieb Miquel Raynal:
> > Hi Marco,
> > 
> > + Richard
> > + Working e-mail address for Boris
> >   
> > > Marco Felsch <m.felsch@xxxxxxxxxxxxxx> wrote on Fri, 26 Jul 2019  
> > 09:44:34 +0200:
> >   
> > > Some devices don't support ecc "official". By "official" I mean that the

                                 ^ uppercase ECC

> > > feature can be set trough the "SET FEATURE (EFh)" command but isn't
> > > reported to the "READ ID Parameter Tables". Because the "ECC Field"
> > > still says that it is disabled. This is applicable at least
> > > for the MT29F2G08ABAGA and MT29F2G08ABBGA devices. Even worse the
> > > datasheet describes the ECC feature in chapter "ECC Protection".

What about:

"Some devices are supposed to do not support on-die ECC but
experience shows that internal ECC machinery can actually be enabled
through the "SET FEATURE (EFh)" command, even if a read of the "READ ID
Parameter Tables" returns that it is not."

> > > 
> > > Currently the driver checks the "READ ID Parameter" field directly after
> > > we enabled the feature. If the check fails we return immediately but
> > > leave the ECC on. Now all future read/program cycles goes trough the ecc
> > > and the host nfc gets confused and reports ECC errors.

And here:

"Currently, the driver checks the "READ ID Parameter" field
directly after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction."

> > > To address this in a common way we need to turn off the ECC directly
> > > after reading the "READ ID Parameter" and before checking the
> > > "ECC status".
> > > 
> > > Signed-off-by: Marco Felsch <m.felsch@xxxxxxxxxxxxxx>  
> > 
> > Good catch! However you report that on-die ECC correction is working
> > but you still disable it; any reason to do so ? Would it be better to
> > actually enable on-die ECC and explicitly mark these two chips as
> > buggy (see [1] for checking the chip IDs)?  
> 
> It's the other way around. The chip is not supposed to have on-die ECC
> according to the datasheet and correctly reflects this fact in the
> READ_ID, so Linux should not try to use the on-die ECC.

Ok I understood the opposite because of the "Even worse the datasheet
describes the ECC feature [...]" which implied to me that the on-die ECC
feature was actually expected despite the status bit not being set.

Marco, can you rephrase a bit the commit log? I proposed something,
feel free to adapt.

> The bug is that the NAND is not supposed to have on-die ECC and reports
> this correctly, but then actually enables a on-die ECC unit when asked
> to, probably due to the same die being used for on-die ECC and ECC off
> devices. The consequence is that Linux (correctly) assumes that the
> full OOB size is available to the controller, but the on-die ECC unit
> scribbles over some of the OOB data.
> 
> I think this fix the most robust solution, as it makes sure to disable
> the on-die ECC unit to avoid the issue, which might also be present on
> other NAND chips we don't know about yet.
> 
> Regards,
> Lucas 
> 
> > [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_macronix.c#L83
> >   
> > > ---
> > >  drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++---
> > >  1 file changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> > > index 1622d3145587..fb199ad2f1a6 100644
> > > --- a/drivers/mtd/nand/raw/nand_micron.c
> > > +++ b/drivers/mtd/nand/raw/nand_micron.c
> > > @@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)  
> > > > >  	    (chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2)
> > > > >  		return MICRON_ON_DIE_UNSUPPORTED;  
> > >    
> > > > > +	/*
> > > > > +	 * It seems that there are devices which do not support ECC official.
> > > > > +	 * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports
> > > > > +	 * enabling the ECC feature but don't reflect that to the READ_ID table.
> > > > > +	 * So we have to guarantee that we disable the ECC feature directly
> > > > > +	 * after we did the READ_ID table command. Later we can evaluate the
> > > > > +	 * ECC_ENABLE support.
> > > > > +	 */
> > > > >  	ret = micron_nand_on_die_ecc_setup(chip, true);
> > > > >  	if (ret)
> > > > >  		return MICRON_ON_DIE_UNSUPPORTED;  
> > > @@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)  
> > > > >  	if (ret)
> > > > >  		return MICRON_ON_DIE_UNSUPPORTED;  
> > >    
> > > > > -	if (!(id[4] & MICRON_ID_ECC_ENABLED))
> > > > > -		return MICRON_ON_DIE_UNSUPPORTED;  
> > > -  
> > > > >  	ret = micron_nand_on_die_ecc_setup(chip, false);
> > > > >  	if (ret)
> > > > >  		return MICRON_ON_DIE_UNSUPPORTED;  
> > >    
> > > > > +	if (!(id[4] & MICRON_ID_ECC_ENABLED))
> > > > > +		return MICRON_ON_DIE_UNSUPPORTED;  
> > > +  
> > > > >  	ret = nand_readid_op(chip, 0, id, sizeof(id));
> > > > >  	if (ret)  
> > >  		return MICRON_ON_DIE_UNSUPPORTED;  
> > 
> > Thanks,
> > Miquèl
> >   


Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux