> -----Original Message----- > From: Sebastian Reichel [mailto:sebastian.reichel@xxxxxxxxxxxxx] > Sent: 2020年9月1日 19:47 > To: dl-linux-imx <linux-imx@xxxxxxx> > Cc: linux-mmc@xxxxxxxxxxxxxxx; Bough Chen <haibo.chen@xxxxxxx>; Shawn > Guo <shawnguo@xxxxxxxxxx>; Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>; > Pengutronix Kernel Team <kernel@xxxxxxxxxxxxxx>; Fabio Estevam > <festevam@xxxxxxxxx>; Baumgartner, Claus (GE Healthcare) > <claus.baumgartner@xxxxxxxxxx> > Subject: Re: mmc0: Timeout waiting for hardware cmd interrupt on i.MX535 > > Hi, > > [add i.MX architecture maintainers to Cc] > > On Tue, Sep 01, 2020 at 07:37:31AM +0000, Baumgartner, Claus (GE > Healthcare) wrote: > > We have a board with an i.MX535 using a Samsung eMMC as persistent > > storage connected to eSDHCv3. Every now and then we produce a build > > that causes emmc timeouts: > > > > Aug 28 07:32:12 csmon kernel: mmc0: Timeout waiting for hardware cmd > interrupt. > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ============ SDHCI REGISTER > > DUMP =========== Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Sys addr: > > 0xe3f12000 | Version: 0x00001201 Aug 28 07:32:12 csmon kernel: mmc0: > > sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000001 Aug 28 07:32:12 csmon > kernel: mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Present: 0x01f80008 | Host > ctl: 0x00000031 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Power: 0x00000002 | Blk > gap: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Wake-up: 0x00000000 | > Clock: 0x0000011f > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Timeout: 0x0000008e | Int > stat: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Int enab: 0x107f000b | Sig > > enab: 0x107f000b Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ACmd stat: > 0x00000000 | Slot int: 0x00001201 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Caps: 0x07eb0000 | > Caps_1: 0x08100810 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Cmd: 0x00000d1a | > Max curr: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[0]: 0x00400900 | > Resp[1]: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[2]: 0x00000000 | > Resp[3]: 0x00000000 > > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Host ctl2: 0x00000000 Aug > > 28 07:32:12 csmon kernel: mmc0: sdhci: ADMA Err: 0x00000000 | ADMA > > Ptr: 0xef041208 Aug 28 07:32:12 csmon kernel: mmc0: sdhci: > > ============================================ > > Some extra information: The timeout always has cmd = 0x00000d1a > (MMC_SEND_STATUS) and resp[0] = 0x00400900 with resp[0] translating to > this IIUIC: > > Bit 8 = Ready for data > Bit 11 = CURRENT_STATE is TRAN > Bit 22 = Illegal command According to the code logic, since this cmd13 get hardware cmd timeout, which means this cmd13 do not get any response. Here the Resp[0] should be the previous command's response. So this means the previous command is an illegal command, cause the emmc internal firmware stuck, and can't response to the next cmd13. I think we need to firstly identify the specific place in emmc driver which trigger the log dump. Best Regards Haibo Chen > > > Timeouts do not occur with every build. After some debugging I have > > found that timeouts seem to depend on code alignment of the > > esdhc_readl_le function. I have bisected the behavior by using the > > System.map and moving/padding the code with NOP instructions (mov > > r0,r0). > > > > My test case has 5 processes continuously creating a file, writing > > random long data, reading data and deleting the file. It seems that > > when the esdhc_writel_le is aligned on a certain address then the > > timeout will occur about 5 times/12h using my test case. If I add one > > more NOP, the timeout will not occur at all. If I continue adding some > > more NOPs the timeouts come back. Seems that it doesn't matter where > > in the code I add NOPs as long as the address is below the address of > > esdhc_writel_le. > > > > We also run the same software on a dual core i.MX6 without any timeout > > issues. > > And the same kernel binary is also used on an i.MX6 single core (albeit with > different SW) withot triggering the problem so far. > > > I have reproduced this with kernel version 4.19.94 and 5.8.3 and we > > have compiled with both gcc8 and gcc9. I'm still searching for the > > root cause and I would appreciate any thoughts about where to go next. > > > > Thanks, > > > > -Claus- > > To me it looks like it might involve an unknown hardware errata for i.MX53, but > there has been one similar report before (unfortunately without the full > register dump) involving virtualization: > > https://patchwork.kernel.org/patch/10705823/ > > Note, that Claus' kernel has been built with CONFIG_PREEMPT_NONE=y. > > -- Sebastian