On Tue, 30 Jul 2019 10:38:22 +0200 Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > Hi Greg, > > Greg Ungerer <gerg@xxxxxxxxxx> wrote on Tue, 30 Jul 2019 16:06:55 +1000: > > > Hi Miquel, > > > > On 30/7/19 10:41 am, Greg Ungerer wrote: > > > On 30/7/19 10:28 am, Greg Ungerer wrote: > > >> On 29/7/19 10:47 pm, Miquel Raynal wrote: > > >>> Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 22:33:56 +1000: > > >>>> On 29/7/19 6:36 pm, Miquel Raynal wrote: > > >>>>> Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 16:41:51 +1000: > > > [snip] > > >>>>>> nand: timing mode 5 not acknowledged by the NAND chip > > >>>>> > > >>>>> What is the final timing mode used? Most of us tested in mode 5 I > > >>>>> guess, maybe mode 4 is broken (don't know if this is the one used here, > > >>>>> neither why mode 5 is refused). Can you please try by limiting the mode > > >>>>> to 0, 1, 2... until, hopefully, we narrow down to the failing mode. > > >>>> > > >>>> Sure, how to do that? > > >>> > > >>> This loop [1] tries to configure each mode (5, 4, ...) until one > > >>> succeeds (default is 0: must always work). Please try to limit mode to > > >>> 0, 1, etc. > > >>> > > >>> Mode 0 should work. > > >>> > > >>> [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_base.c#L933 > > >> > > >> The normal behavior - which usually works - has > > >> chip->onfi_timing_mode_default=5 here. So in other words on the first pass > > >> through this loop it is checking mode 5, and setting it as the default. > > >> > > >> I am running a test/reboot loop now waiting for failure to see > > >> if it is still using mode 5 in that case. > > > > > > With this trace in place: > > > > > > --- a/linux/drivers/mtd/nand/raw/nand_base.c > > > +++ b/linux/drivers/mtd/nand/raw/nand_base.c > > > @@ -910,6 +910,7 @@ static int nand_init_data_interface(struct nand_chip *chip) > > > } > > > > > > for (mode = fls(modes) - 1; mode >= 0; mode--) { > > > + printk("%s(%d): checking mode=%d\n", __FILE__, __LINE__, mode); > > > ret = onfi_fill_data_interface(chip, NAND_SDR_IFACE, mode); > > > if (ret) > > > continue; > > > @@ -923,10 +924,12 @@ static int nand_init_data_interface(struct nand_chip *chip) > > > &chip->data_interface); > > > if (!ret) { > > > chip->onfi_timing_mode_default = mode; > > > + printk("%s(%d): BREAKING AT mode=%d\n", __FILE__, __LINE__, mode); > > > break; > > > } > > > } > > > > > > + printk("%s(%d): chip->onfi_timing_mode_default=%d\n", __FILE__, __LINE__, chip->onfi_timing_mode_default); > > > return 0; > > > } > > > > > > > > > First NAND failure gives this: > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > > nand: Micron MT29F2G08ABAEAWP > > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > gpmi-nand 1806000.gpmi-nand: use legacy bch geometry > > > drivers/mtd/nand/raw/nand_base.c(913): checking mode=5 > > > drivers/mtd/nand/raw/nand_base.c(927): BREAKING AT mode=5 > > > drivers/mtd/nand/raw/nand_base.c(932): chip->onfi_timing_mode_default=5 > > > gpmi-nand 1806000.gpmi-nand: DMA timeout, last DMA > > > gpmi-nand 1806000.gpmi-nand: Show GPMI registers : > > > gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x20830002 > > > gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x01c6800c > > > gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00010101 > > > gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0xe0000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x23023336 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x000001ee > > > gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0xff000001 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x00000100 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x05020000 > > > gpmi-nand 1806000.gpmi-nand: Show BCH registers : > > > gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x00000100 > > > gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000010 > > > gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0x030a2080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x083e2080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x070a4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0x10da4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x070a4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x10da4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0e0 : 0x070a4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x0f0 : 0x10da4080 > > > gpmi-nand 1806000.gpmi-nand: offset 0x100 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x110 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x120 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x130 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x140 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x150 : 0x20484342 > > > gpmi-nand 1806000.gpmi-nand: offset 0x160 : 0x01000000 > > > gpmi-nand 1806000.gpmi-nand: offset 0x170 : 0x00000000 > > > gpmi-nand 1806000.gpmi-nand: BCH Geometry : > > > GF length : 13 > > > ECC Strength : 8 > > > Page Size in Bytes : 2110 > > > Metadata Size in Bytes : 10 > > > ECC Chunk0 Size in Bytes: 512 > > > ECC Chunkn Size in Bytes: 512 > > > ECC Chunk Count : 4 > > > Payload Size in Bytes : 2048 > > > Auxiliary Size in Bytes: 16 > > > Auxiliary Status Offset: 12 > > > Block Mark Byte Offset : 1999 > > > Block Mark Bit Offset : 0 > > > gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -110 > > > nand: timing mode 5 not acknowledged by the NAND chip > > > gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22 > > > > Not sure if this is a useful data point... But I modified that > > nand_init_data_interface() loop to start checking from data mode 4. > > So now on every boot it defaults to mode 4. That has been running > > most of the day, up to 900 boot cycles now, no failures. > > Ok so after having chatted quite a bit with Boris, it is very likely > that, for these chips, the timings in mode 5 are too tight. It could > fail the GET_FEATURES once in mode 5. Can you please dump every single > intermediate value in gpmi_nfc_compute_timings() (period, *_cycles, > use of half pêriods, tRP, sample delay, etc) as well as the content > of /sys/kernel/debug/clk/clk_summary (you'll need debugfs support > enabled and mounted). Not sure the clk will stay at the rate it was set during the timing selection. Can you also add a trace printing the result of clk_get_rate(r->clock[0], hw->clk_rate) here [1]? [1]https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c#L711 ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/