Hi Greg, Greg Ungerer <gerg@xxxxxxxxxx> wrote on Tue, 30 Jul 2019 16:06:55 +1000: > Hi Miquel, > > On 30/7/19 10:41 am, Greg Ungerer wrote: > > On 30/7/19 10:28 am, Greg Ungerer wrote: > >> On 29/7/19 10:47 pm, Miquel Raynal wrote: > >>> Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 22:33:56 +1000: > >>>> On 29/7/19 6:36 pm, Miquel Raynal wrote: > >>>>> Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 16:41:51 +1000: > > [snip] > >>>>>> nand: timing mode 5 not acknowledged by the NAND chip > >>>>> > >>>>> What is the final timing mode used? Most of us tested in mode 5 I > >>>>> guess, maybe mode 4 is broken (don't know if this is the one used here, > >>>>> neither why mode 5 is refused). Can you please try by limiting the mode > >>>>> to 0, 1, 2... until, hopefully, we narrow down to the failing mode. > >>>> > >>>> Sure, how to do that? > >>> > >>> This loop [1] tries to configure each mode (5, 4, ...) until one > >>> succeeds (default is 0: must always work). Please try to limit mode to > >>> 0, 1, etc. > >>> > >>> Mode 0 should work. > >>> > >>> [1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_base.c#L933 > >> > >> The normal behavior - which usually works - has > >> chip->onfi_timing_mode_default=5 here. So in other words on the first pass > >> through this loop it is checking mode 5, and setting it as the default. > >> > >> I am running a test/reboot loop now waiting for failure to see > >> if it is still using mode 5 in that case. > > > > With this trace in place: > > > > --- a/linux/drivers/mtd/nand/raw/nand_base.c > > +++ b/linux/drivers/mtd/nand/raw/nand_base.c > > @@ -910,6 +910,7 @@ static int nand_init_data_interface(struct nand_chip *chip) > > } > > > > for (mode = fls(modes) - 1; mode >= 0; mode--) { > > + printk("%s(%d): checking mode=%d\n", __FILE__, __LINE__, mode); > > ret = onfi_fill_data_interface(chip, NAND_SDR_IFACE, mode); > > if (ret) > > continue; > > @@ -923,10 +924,12 @@ static int nand_init_data_interface(struct nand_chip *chip) > > &chip->data_interface); > > if (!ret) { > > chip->onfi_timing_mode_default = mode; > > + printk("%s(%d): BREAKING AT mode=%d\n", __FILE__, __LINE__, mode); > > break; > > } > > } > > > > + printk("%s(%d): chip->onfi_timing_mode_default=%d\n", __FILE__, __LINE__, chip->onfi_timing_mode_default); > > return 0; > > } > > > > > > First NAND failure gives this: > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda > > nand: Micron MT29F2G08ABAEAWP > > nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > gpmi-nand 1806000.gpmi-nand: use legacy bch geometry > > drivers/mtd/nand/raw/nand_base.c(913): checking mode=5 > > drivers/mtd/nand/raw/nand_base.c(927): BREAKING AT mode=5 > > drivers/mtd/nand/raw/nand_base.c(932): chip->onfi_timing_mode_default=5 > > gpmi-nand 1806000.gpmi-nand: DMA timeout, last DMA > > gpmi-nand 1806000.gpmi-nand: Show GPMI registers : > > gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x20830002 > > gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x01c6800c > > gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00010101 > > gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0xe0000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x23023336 > > gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x000001ee > > gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0xff000001 > > gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x00000100 > > gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x05020000 > > gpmi-nand 1806000.gpmi-nand: Show BCH registers : > > gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x00000100 > > gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000010 > > gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0x030a2080 > > gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x083e2080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x070a4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0x10da4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x070a4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x10da4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0e0 : 0x070a4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x0f0 : 0x10da4080 > > gpmi-nand 1806000.gpmi-nand: offset 0x100 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x110 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x120 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x130 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x140 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x150 : 0x20484342 > > gpmi-nand 1806000.gpmi-nand: offset 0x160 : 0x01000000 > > gpmi-nand 1806000.gpmi-nand: offset 0x170 : 0x00000000 > > gpmi-nand 1806000.gpmi-nand: BCH Geometry : > > GF length : 13 > > ECC Strength : 8 > > Page Size in Bytes : 2110 > > Metadata Size in Bytes : 10 > > ECC Chunk0 Size in Bytes: 512 > > ECC Chunkn Size in Bytes: 512 > > ECC Chunk Count : 4 > > Payload Size in Bytes : 2048 > > Auxiliary Size in Bytes: 16 > > Auxiliary Status Offset: 12 > > Block Mark Byte Offset : 1999 > > Block Mark Bit Offset : 0 > > gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -110 > > nand: timing mode 5 not acknowledged by the NAND chip > > gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22 > > Not sure if this is a useful data point... But I modified that > nand_init_data_interface() loop to start checking from data mode 4. > So now on every boot it defaults to mode 4. That has been running > most of the day, up to 900 boot cycles now, no failures. Ok so after having chatted quite a bit with Boris, it is very likely that, for these chips, the timings in mode 5 are too tight. It could fail the GET_FEATURES once in mode 5. Can you please dump every single intermediate value in gpmi_nfc_compute_timings() (period, *_cycles, use of half pêriods, tRP, sample delay, etc) as well as the content of /sys/kernel/debug/clk/clk_summary (you'll need debugfs support enabled and mounted). Also, can you be sure that the NAND chip is powered with 3.3V? Thanks, Miquèl ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/