RE: mmc0: Timeout waiting for hardware cmd interrupt on i.MX535

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Sebastian Reichel [mailto:sebastian.reichel@xxxxxxxxxxxxx]
> Sent: 2020年9月1日 19:47
> To: dl-linux-imx <linux-imx@xxxxxxx>
> Cc: linux-mmc@xxxxxxxxxxxxxxx; Bough Chen <haibo.chen@xxxxxxx>; Shawn
> Guo <shawnguo@xxxxxxxxxx>; Sascha Hauer <s.hauer@xxxxxxxxxxxxxx>;
> Pengutronix Kernel Team <kernel@xxxxxxxxxxxxxx>; Fabio Estevam
> <festevam@xxxxxxxxx>; Baumgartner, Claus (GE Healthcare)
> <claus.baumgartner@xxxxxxxxxx>
> Subject: Re: mmc0: Timeout waiting for hardware cmd interrupt on i.MX535
> 
> Hi,
> 
> [add i.MX architecture maintainers to Cc]
> 
> On Tue, Sep 01, 2020 at 07:37:31AM +0000, Baumgartner, Claus (GE
> Healthcare) wrote:
> > We have a board with an i.MX535 using a Samsung eMMC as persistent
> > storage connected to eSDHCv3. Every now and then we produce a build
> > that causes emmc timeouts:
> >
> > Aug 28 07:32:12 csmon kernel: mmc0: Timeout waiting for hardware cmd
> interrupt.
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ============ SDHCI REGISTER
> > DUMP =========== Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Sys addr:
> > 0xe3f12000 | Version:  0x00001201 Aug 28 07:32:12 csmon kernel: mmc0:
> > sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000001 Aug 28 07:32:12 csmon
> kernel: mmc0: sdhci: Argument:  0x00010000 | Trn mode: 0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Present:   0x01f80008 | Host
> ctl: 0x00000031
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Power:     0x00000002 | Blk
> gap:  0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Wake-up:   0x00000000 |
> Clock:    0x0000011f
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Timeout:   0x0000008e | Int
> stat: 0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Int enab:  0x107f000b | Sig
> > enab: 0x107f000b Aug 28 07:32:12 csmon kernel: mmc0: sdhci: ACmd stat:
> 0x00000000 | Slot int: 0x00001201
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Caps:      0x07eb0000 |
> Caps_1:   0x08100810
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Cmd:       0x00000d1a |
> Max curr: 0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[0]:   0x00400900 |
> Resp[1]:  0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Resp[2]:   0x00000000 |
> Resp[3]:  0x00000000
> > Aug 28 07:32:12 csmon kernel: mmc0: sdhci: Host ctl2: 0x00000000 Aug
> > 28 07:32:12 csmon kernel: mmc0: sdhci: ADMA Err:  0x00000000 | ADMA
> > Ptr: 0xef041208 Aug 28 07:32:12 csmon kernel: mmc0: sdhci:
> > ============================================
> 
> Some extra information: The timeout always has cmd = 0x00000d1a
> (MMC_SEND_STATUS) and resp[0] = 0x00400900 with resp[0] translating to
> this IIUIC:
> 
> Bit 8 = Ready for data
> Bit 11 = CURRENT_STATE is TRAN
> Bit 22 = Illegal command

According to the code logic, since this cmd13 get hardware cmd timeout, which means this cmd13 do not get any response. Here the Resp[0] should be the previous command's response.
So this means the previous command is an illegal command, cause the emmc internal firmware stuck, and can't response to the next cmd13.
I think we need to firstly identify the specific place in emmc driver which trigger the log dump. 


Best Regards
Haibo Chen

> 
> > Timeouts do not occur with every build. After some debugging I have
> > found that timeouts seem to depend on code alignment of the
> > esdhc_readl_le function. I have bisected the behavior by using the
> > System.map and moving/padding the code with NOP instructions (mov
> > r0,r0).
> >
> > My test case has 5 processes continuously creating a file, writing
> > random long data, reading data and deleting the file. It seems that
> > when the esdhc_writel_le is aligned on a certain address then the
> > timeout will occur about 5 times/12h using my test case. If I add one
> > more NOP, the timeout will not occur at all. If I continue adding some
> > more NOPs the timeouts come back. Seems that it doesn't matter where
> > in the code I add NOPs as long as the address is below the address of
> > esdhc_writel_le.
> >
> > We also run the same software on a dual core i.MX6 without any timeout
> > issues.
> 
> And the same kernel binary is also used on an i.MX6 single core (albeit with
> different SW) withot triggering the problem so far.
> 
> > I have reproduced this with kernel version 4.19.94 and 5.8.3 and we
> > have compiled with both gcc8 and gcc9. I'm still searching for the
> > root cause and I would appreciate any thoughts about where to go next.
> >
> > Thanks,
> >
> > -Claus-
> 
> To me it looks like it might involve an unknown hardware errata for i.MX53, but
> there has been one similar report before (unfortunately without the full
> register dump) involving virtualization:
> 
> https://patchwork.kernel.org/patch/10705823/
> 
> Note, that Claus' kernel has been built with CONFIG_PREEMPT_NONE=y.
> 
> -- Sebastian




[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux