Re: Power cut leads to "corrupt empty space"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding Han Xu and Miquel

On Sat, Feb 29, 2020 at 9:46 AM Timo Ketola <Timo.Ketola@xxxxxxxxxx> wrote:
>
> On 27.2.2020 17.16, Fabio Estevam wrote:
> > Hi Timo,
> >
> > On Thu, Feb 27, 2020 at 10:42 AM Timo Ketola <Timo.Ketola@xxxxxxxxxx> wrote:
> >
> >> That might take considerable effort. Would you think, there should be
> >> fixes for this? Would it be on recovery side or preventing the issue
> >> happening in the first place?
> >
> > It is hard to tell. 4.9.88 is an old version, so better try with mainline
> >
>
> Ok, I managed to get v5.4 booting - almost.
>
> First, we had 'fsl,legacy-bch-geometry;' flag in device tree and I
> couldn't find how I would get the same effect in this kernel in a
> 'standard way'. I had to put 'nand-ecc-strength = <8>;
> nand-ecc-step-size = <512>;' into the device tree and make this change
> in drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c:
>
> > @@ -507,11 +507,11 @@ static int common_nfc_set_geometry(struct gpmi_nand_data *this)
> >       struct nand_chip *chip = &this->nand;
> >
> >       if (chip->ecc.strength > 0 && chip->ecc.size > 0)
> >               return set_geometry_by_ecc_info(this, chip->ecc.strength,
> >                                               chip->ecc.size);
> > -
> > +     return legacy_set_geometry(this);
> >       if ((of_property_read_bool(this->dev->of_node, "fsl,use-minimum-ecc"))
> >                               || legacy_set_geometry(this)) {
> >               if (!(chip->base.eccreq.strength > 0 &&
> >                     chip->base.eccreq.step_size > 0))
> >                       return -EINVAL;
>
> That is, call legacy_set_geometry unconditionally without then calling
> set_geometry_by_ecc_info. After this it began to read the first half of
> the NAND correctly.
>
> The there is a bug (I think) in the NAND chip S34ML16G2. It has four
> S34ML04G2 dies and two chip selects in the package and shows up as two
> chips. It reports 128KiB per EB, 8192 EBs per LUN and 2 LUNs making up
> 2GiB. This is correct for the package but then Linux finds two such
> chips, total of 4GiB, which is not correct. So I have this in
> drivers/mtd/nand/raw/nand_base.c:
>
> > @@ -4733,12 +4760,36 @@ static int nand_detect(struct nand_chip *chip, struct nand_flash_dev *type)
> >       if (!type->name || !type->pagesize) {
> >               /* Check if the chip is ONFI compliant */
> >               ret = nand_onfi_detect(chip);
> >               if (ret < 0)
> >                       return ret;
> > -             else if (ret)
> > +             else if (ret) {
> > +                     if (type->name) {
> > +                             struct nand_device *nand = &chip->base;
> > +                             unsigned luns;
> > +
> > +                             pr_info("%s detected\n", type->name);
> > +                             pr_info("luns %d, eraseblocks %d, pages %d, page size %d\n",
> > +                                             nand->memorg.luns_per_target,
> > +                                             nand->memorg.eraseblocks_per_lun,
> > +                                             nand->memorg.pages_per_eraseblock,
> > +                                             nand->memorg.pagesize);
> > +                             pr_info("sizes: page 0x%X, erase 0x%X, chip 0x%X\n",
> > +                                             type->pagesize,
> > +                                             type->erasesize,
> > +                                             type->chipsize);
> > +                             luns = DIV_ROUND_DOWN_ULL((u64)type->chipsize << 20,
> > +                                             nand->memorg.pagesize *
> > +                                             nand->memorg.pages_per_eraseblock *
> > +                                             nand->memorg.eraseblocks_per_lun);
> > +                             if (nand->memorg.luns_per_target != luns) {
> > +                                     printk("Correcting luns-per-target to %d", luns);
> > +                                     nand->memorg.luns_per_target = luns;
> > +                             }
> > +                     }
> >                       goto ident_done;
> > +             }
> >
> >               /* Check if the chip is JEDEC compliant */
> >               ret = nand_jedec_detect(chip);
> >               if (ret < 0)
> >                       return ret;
>
> output:
>
> > nand: NAND 1GiB 3,3V 8-bit detected
> > nand: luns 2, eraseblocks 8192, pages 64, page size 2048
> > nand: sizes: page 0x0, erase 0x0, chip 0x400
> > Correcting luns-pre-target to 1
> > nand: device found, Manufacturer ID: 0x01, Chip ID: 0xd3
> > nand: AMD/Spansion S34ML16G2
> > nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
> > nand: 2 chips detected
>
> That idea worked on v4.9 imx kernel but not here. The driver reports ECC
> errors for the second half of the NAND. I have debugged down to gpmi
> driver and checked that page address is as should (e.g. realpage 524288,
> page 0 0x80000 in nand_do_read_ops for the first page of the second
> half) and target selection changes correctly. But it reads only FFs.
> Still, it seems to erase correct blocks when trying to write BBTs.
>
> I put this in drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c:
>
> > @@ -2270,10 +2270,18 @@ static struct dma_async_tx_descriptor *gpmi_chain_command(
> >
> >       transfer->direction = DMA_TO_DEVICE;
> >
> >       desc = dmaengine_prep_slave_sg(channel, &transfer->sgl, 1, DMA_MEM_TO_DEV,
> >                                      MXS_DMA_CTRL_WAIT4END);
> > +     if (1) {
> > +             unsigned i;
> > +             char b[160], *p;
> > +
> > +             p = b + sprintf(b, "Transfer from/to chip %d, pio[0] %X, naddr %d, addr", chip, pio[0], naddr);
> > +             for (i = 0; i < naddr; ++i) p += sprintf(p, " %02X", addr[i]);
> > +             pr_info("%s\n", b);
> > +     }
> >       return desc;
> >  }
> >
>
> and see
>
> > Transfer from/to chip 1, pio[0] 930004, naddr 3, addr C0 FF 07
>
> for erase, which seems to work and
>
> > Transfer from/to chip 1, pio[0] 930006, naddr 5, addr 00 00 C0 FF 07
>
> for reads/writes, which fail.
>
> I'm real stuck.
>
> --
>
> Timo

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux