Hi, [add i.MX architecture maintainers to Cc] On Tue, Sep 01, 2020 at 07:37:31AM +0000, Baumgartner, Claus (GE Healthcare) wrote: > We have a board with an i.MX535 using a Samsung eMMC as persistent > storage connected to eSDHCv3. Every now and then we produce a > build that causes emmc timeouts: > > Aug 28 07:32:12 csmon kernel: mmc0: Timeout waiting for hardware cmd interrupt. > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ============ SDHCI REGISTER DUMP =========== > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Sys addr: 0xe3f12000 | Version: 0x00001201 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Blk size: 0x00000200 | Blk cnt: 0x00000001 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Present: 0x01f80008 | Host ctl: 0x00000031 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Power: 0x00000002 | Blk gap: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x0000011f > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Timeout: 0x0000008e | Int stat: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Int enab: 0x107f000b | Sig enab: 0x107f000b > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00001201 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Caps: 0x07eb0000 | Caps_1: 0x08100810 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[2]: 0x00000000 | Resp[3]: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Host ctl2: 0x00000000 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xef041208 > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ============================================ Some extra information: The timeout always has cmd = 0x00000d1a (MMC_SEND_STATUS) and resp[0] = 0x00400900 with resp[0] translating to this IIUIC: Bit 8 = Ready for data Bit 11 = CURRENT_STATE is TRAN Bit 22 = Illegal command > Timeouts do not occur with every build. After some debugging I > have found that timeouts seem to depend on code alignment of the > esdhc_readl_le function. I have bisected the behavior by using the > System.map and moving/padding the code with NOP instructions (mov > r0,r0). > > My test case has 5 processes continuously creating a file, writing > random long data, reading data and deleting the file. It seems > that when the esdhc_writel_le is aligned on a certain address then > the timeout will occur about 5 times/12h using my test case. If I > add one more NOP, the timeout will not occur at all. If I continue > adding some more NOPs the timeouts come back. Seems that it > doesn't matter where in the code I add NOPs as long as the address > is below the address of esdhc_writel_le. > > We also run the same software on a dual core i.MX6 without any > timeout issues. And the same kernel binary is also used on an i.MX6 single core (albeit with different SW) withot triggering the problem so far. > I have reproduced this with kernel version 4.19.94 and 5.8.3 and > we have compiled with both gcc8 and gcc9. I'm still searching for > the root cause and I would appreciate any thoughts about where to > go next. > > Thanks, > > -Claus- To me it looks like it might involve an unknown hardware errata for i.MX53, but there has been one similar report before (unfortunately without the full register dump) involving virtualization: https://patchwork.kernel.org/patch/10705823/ Note, that Claus' kernel has been built with CONFIG_PREEMPT_NONE=y. -- Sebastian
Attachment:
signature.asc
Description: PGP signature