Re: GPMI iMX6ull timeout on DMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Miquel,

On 29/7/19 10:47 pm, Miquel Raynal wrote:
Hi Greg,

+ Boris

Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 22:33:56 +1000:

Hi Miquel,

On 29/7/19 6:36 pm, Miquel Raynal wrote:
Hi Greg,

One question below.

+Michael
+Sascha

Hello Michael, here is a similar issue to yours, I know you did not
have enough time to share your solution but here we have someone else
reproducing the issue, would you mind sharing a branch or a patch, even
a WIP one, just to help debugging?

Greg Ungerer <gerg@xxxxxxxxxx> wrote on Mon, 29 Jul 2019 16:41:51 +1000:
Hi Miquel,

I am experiencing a problem with NAND flash DMA timeouts on
iMX6ull based boards. The problem is very similar to that
described in:

     https://linux-mtd.infradead.narkive.com/JIUulfFB/gpmi-imx6ull-timeout-on-dma

That didn't come to any specific resolution that I could see
in that thread.

The boot trace on the console for me looks like this:

nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
nand: Micron MT29F2G08ABAEAWP
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
gpmi-nand 1806000.gpmi-nand: DMA timeout, last DMA
gpmi-nand 1806000.gpmi-nand: Show GPMI registers :
gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x20830002
gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x01c6800c
gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00010101
gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0xe0000000
gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x23023336
gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x000001ee
gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0xff000001
gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x00000001
gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x05020000
gpmi-nand 1806000.gpmi-nand: Show BCH registers :
gpmi-nand 1806000.gpmi-nand: offset 0x000 : 0x00000100
gpmi-nand 1806000.gpmi-nand: offset 0x010 : 0x00000010
gpmi-nand 1806000.gpmi-nand: offset 0x020 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x030 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x040 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x050 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x060 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x070 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x080 : 0x030a2080
gpmi-nand 1806000.gpmi-nand: offset 0x090 : 0x083e2080
gpmi-nand 1806000.gpmi-nand: offset 0x0a0 : 0x070a4080
gpmi-nand 1806000.gpmi-nand: offset 0x0b0 : 0x10da4080
gpmi-nand 1806000.gpmi-nand: offset 0x0c0 : 0x070a4080
gpmi-nand 1806000.gpmi-nand: offset 0x0d0 : 0x10da4080
gpmi-nand 1806000.gpmi-nand: offset 0x0e0 : 0x070a4080
gpmi-nand 1806000.gpmi-nand: offset 0x0f0 : 0x10da4080
gpmi-nand 1806000.gpmi-nand: offset 0x100 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x110 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x120 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x130 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x140 : 0x00000000
gpmi-nand 1806000.gpmi-nand: offset 0x150 : 0x20484342
gpmi-nand 1806000.gpmi-nand: offset 0x160 : 0x01000000
gpmi-nand 1806000.gpmi-nand: offset 0x170 : 0x00000000
gpmi-nand 1806000.gpmi-nand: BCH Geometry :
GF length              : 13
ECC Strength           : 8
Page Size in Bytes     : 2110
Metadata Size in Bytes : 10
ECC Chunk0 Size in Bytes: 512
ECC Chunkn Size in Bytes: 512
ECC Chunk Count        : 4
Payload Size in Bytes  : 2048
Auxiliary Size in Bytes: 16
Auxiliary Status Offset: 12
Block Mark Byte Offset : 1999
Block Mark Bit Offset  : 0
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -110
nand: timing mode 5 not acknowledged by the NAND chip

What is the final timing mode used? Most of us tested in mode 5 I
guess, maybe mode 4 is broken (don't know if this is the one used here,
neither why mode 5 is refused). Can you please try by limiting the mode
to 0, 1, 2... until, hopefully, we narrow down to the failing mode.

Sure, how to do that?

This loop [1] tries to configure each mode (5, 4, ...) until one
succeeds (default is 0: must always work). Please try to limit mode to
0, 1, etc.

Mode 0 should work.

[1] https://elixir.bootlin.com/linux/v5.3-rc1/source/drivers/mtd/nand/raw/nand_base.c#L933

The normal behavior - which usually works - has
chip->onfi_timing_mode_default=5 here. So in other words on the first pass
through this loop it is checking mode 5, and setting it as the default.

I am running a test/reboot loop now waiting for failure to see
if it is still using mode 5 in that case.

Regards
Greg



gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
Scanning device for bad blocks
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
....
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
gpmi-nand 1806000.gpmi-nand: Chip: 0, Error -22
5 fixed-partitions partitions found on MTD device gpmi-nand
Creating 5 MTD partitions on "gpmi-nand":
0x000000000000-0x000000500000 : "u-boot"
0x000000500000-0x000000600000 : "u-boot-env"
0x000000600000-0x000000800000 : "log"
0x000000800000-0x000010000000 : "flash"
0x000000000000-0x000010000000 : "all"
gpmi-nand 1806000.gpmi-nand: driver registered.


This is using a linux kernel v5.1.14. I have seen this happen on
a number of boards I have here - but it is only occasional. It
only happens once in a while on boot, maybe 1 in 40 or more times.
So it can take quite a while to reproduce (using a boot loop setup).

That's strange... I don't get what would produce such unstable issue.

My initial guess is that the calculated timing is very marginal.

What do you mean by "marginal"?

The problem seems more likely to happen if flash write activity
had been occurring just before a soft reboot. Its not a guarantee,
just more likely.

That's really disturbing. I doubt this is the real cause though.


Interesting observation is that Michael was using Micron flash,
and boards that I have with the problem also have Micron flash.
Both a form of Micron MT29F2G08.

I have similar boards, iMX6ull based, with different brands of
NAND flash and I have not seen any problem on them.

That's great to narrow down the root cause. Maybe these chips have
tighter timing constraints.


Regards
Greg



As per the email thread I pointed to above I looked at reverting
those patches, but that was not at all easy given how much the gpmi
driver code had moved. So instead I modified the code with this:

--- a/linux/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
+++ b/linux/drivers/mtd/nand/raw/gpmi-nand/gpmi-lib.c
@@ -481,6 +481,7 @@ static void gpmi_nfc_compute_timings(struct gpmi_nand_data *this,
      void gpmi_nfc_apply_timings(struct gpmi_nand_data *this)
    {
+#if 0
           struct gpmi_nfc_hardware_timing *hw = &this->hw;
           struct resources *r = &this->resources;
           void __iomem *gpmi_regs = r->gpmi_regs;
@@ -505,6 +512,7 @@ void gpmi_nfc_apply_timings(struct gpmi_nand_data *this)
             /* Wait for the DLL to settle. */
           udelay(dll_wait_time_us);
+#endif
    }
      int gpmi_setup_data_interface(struct nand_chip *chip, int chipnr,

So far after a couple of days of testing with this I no longer
see the DMA timeout.

Any thoughts?

Regards
Greg

Thanks,
Miquèl

Thanks,
Miquèl


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux