Re: Power cut leads to "corrupt empty space"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27.2.2020 17.16, Fabio Estevam wrote:
> Hi Timo,
> 
> On Thu, Feb 27, 2020 at 10:42 AM Timo Ketola <Timo.Ketola@xxxxxxxxxx> wrote:
> 
>> That might take considerable effort. Would you think, there should be
>> fixes for this? Would it be on recovery side or preventing the issue
>> happening in the first place?
> 
> It is hard to tell. 4.9.88 is an old version, so better try with mainline
> 

Ok, I managed to get v5.4 booting - almost.

First, we had 'fsl,legacy-bch-geometry;' flag in device tree and I
couldn't find how I would get the same effect in this kernel in a
'standard way'. I had to put 'nand-ecc-strength = <8>;
nand-ecc-step-size = <512>;' into the device tree and make this change
in drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c:

> @@ -507,11 +507,11 @@ static int common_nfc_set_geometry(struct gpmi_nand_data *this)
>  	struct nand_chip *chip = &this->nand;
>  
>  	if (chip->ecc.strength > 0 && chip->ecc.size > 0)
>  		return set_geometry_by_ecc_info(this, chip->ecc.strength,
>  						chip->ecc.size);
> -
> +	return legacy_set_geometry(this);
>  	if ((of_property_read_bool(this->dev->of_node, "fsl,use-minimum-ecc"))
>  				|| legacy_set_geometry(this)) {
>  		if (!(chip->base.eccreq.strength > 0 &&
>  		      chip->base.eccreq.step_size > 0))
>  			return -EINVAL;

That is, call legacy_set_geometry unconditionally without then calling
set_geometry_by_ecc_info. After this it began to read the first half of
the NAND correctly.

The there is a bug (I think) in the NAND chip S34ML16G2. It has four
S34ML04G2 dies and two chip selects in the package and shows up as two
chips. It reports 128KiB per EB, 8192 EBs per LUN and 2 LUNs making up
2GiB. This is correct for the package but then Linux finds two such
chips, total of 4GiB, which is not correct. So I have this in
drivers/mtd/nand/raw/nand_base.c:

> @@ -4733,12 +4760,36 @@ static int nand_detect(struct nand_chip *chip, struct nand_flash_dev *type)
>  	if (!type->name || !type->pagesize) {
>  		/* Check if the chip is ONFI compliant */
>  		ret = nand_onfi_detect(chip);
>  		if (ret < 0)
>  			return ret;
> -		else if (ret)
> +		else if (ret) {
> +			if (type->name) {
> +				struct nand_device *nand = &chip->base;
> +				unsigned luns;
> +
> +				pr_info("%s detected\n", type->name);
> +				pr_info("luns %d, eraseblocks %d, pages %d, page size %d\n",
> +						nand->memorg.luns_per_target,
> +						nand->memorg.eraseblocks_per_lun,
> +						nand->memorg.pages_per_eraseblock,
> +						nand->memorg.pagesize);
> +				pr_info("sizes: page 0x%X, erase 0x%X, chip 0x%X\n",
> +						type->pagesize,
> +						type->erasesize,
> +						type->chipsize);
> +				luns = DIV_ROUND_DOWN_ULL((u64)type->chipsize << 20,
> +						nand->memorg.pagesize *
> +						nand->memorg.pages_per_eraseblock *
> +						nand->memorg.eraseblocks_per_lun);
> +				if (nand->memorg.luns_per_target != luns) {
> +					printk("Correcting luns-per-target to %d", luns);
> +					nand->memorg.luns_per_target = luns;
> +				}
> +			}
>  			goto ident_done;
> +		}
>  
>  		/* Check if the chip is JEDEC compliant */
>  		ret = nand_jedec_detect(chip);
>  		if (ret < 0)
>  			return ret;

output:

> nand: NAND 1GiB 3,3V 8-bit detected
> nand: luns 2, eraseblocks 8192, pages 64, page size 2048
> nand: sizes: page 0x0, erase 0x0, chip 0x400
> Correcting luns-pre-target to 1
> nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
> nand: AMD/Spansion S34ML16G2
> nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
> nand: 2 chips detected

That idea worked on v4.9 imx kernel but not here. The driver reports ECC
errors for the second half of the NAND. I have debugged down to gpmi
driver and checked that page address is as should (e.g. realpage 524288,
page 0 0x80000 in nand_do_read_ops for the first page of the second
half) and target selection changes correctly. But it reads only FFs.
Still, it seems to erase correct blocks when trying to write BBTs.

I put this in drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c:

> @@ -2270,10 +2270,18 @@ static struct dma_async_tx_descriptor *gpmi_chain_command(
>  
>  	transfer->direction = DMA_TO_DEVICE;
>  
>  	desc = dmaengine_prep_slave_sg(channel, &transfer->sgl, 1, DMA_MEM_TO_DEV,
>  				       MXS_DMA_CTRL_WAIT4END);
> +	if (1) {
> +		unsigned i;
> +		char b[160], *p;
> +
> +		p = b + sprintf(b, "Transfer from/to chip %d, pio[0] %X, naddr %d, addr", chip, pio[0], naddr);
> +		for (i = 0; i < naddr; ++i) p += sprintf(p, " %02X", addr[i]);
> +		pr_info("%s\n", b);
> +	}
>  	return desc;
>  }
>  

and see

> Transfer from/to chip 1, pio[0] 930004, naddr 3, addr C0 FF 07

for erase, which seems to work and

> Transfer from/to chip 1, pio[0] 930006, naddr 5, addr 00 00 C0 FF 07

for reads/writes, which fail.

I'm real stuck.

--

Timo
______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux