RE: IMX8MM eMMC CQHCI timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Tim Harvey [mailto:tharvey@xxxxxxxxxxxxx]
> Sent: 2021年11月4日 0:50
> To: Bough Chen <haibo.chen@xxxxxxx>
> Cc: Linux MMC List <linux-mmc@xxxxxxxxxxxxxxx>; Marcel Ziswiler
> <marcel@xxxxxxxxxxxx>; Fabio Estevam <festevam@xxxxxxxxx>; Schrempf
> Frieder <frieder.schrempf@xxxxxxxxxx>; Adam Ford <aford173@xxxxxxxxx>;
> Lucas Stach <l.stach@xxxxxxxxxxxxxx>; Peng Fan <peng.fan@xxxxxxx>; Frank
> Li <frank.li@xxxxxxx>; Adrian Hunter <adrian.hunter@xxxxxxxxx>; Shawn Guo
> <shawnguo@xxxxxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; Sascha
> Hauer <s.hauer@xxxxxxxxxxxxxx>; Pengutronix Kernel Team
> <kernel@xxxxxxxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Cale Collins
> <ccollins@xxxxxxxxxxxxx>
> Subject: Re: IMX8MM eMMC CQHCI timeout
> 
> On Sun, Oct 31, 2021 at 6:57 PM Bough Chen <haibo.chen@xxxxxxx> wrote:
> >
> > > -----Original Message-----
> > > From: Tim Harvey [mailto:tharvey@xxxxxxxxxxxxx]
> > > Sent: 2021年10月30日 4:47
> > > To: Linux MMC List <linux-mmc@xxxxxxxxxxxxxxx>; Marcel Ziswiler
> > > <marcel@xxxxxxxxxxxx>; Fabio Estevam <festevam@xxxxxxxxx>; Schrempf
> > > Frieder <frieder.schrempf@xxxxxxxxxx>; Adam Ford
> > > <aford173@xxxxxxxxx>; Bough Chen <haibo.chen@xxxxxxx>; Lucas Stach
> > > <l.stach@xxxxxxxxxxxxxx>; Peng Fan <peng.fan@xxxxxxx>; Frank Li
> > > <frank.li@xxxxxxx>
> > > Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>; Shawn Guo
> > > <shawnguo@xxxxxxxxxx>; Ulf Hansson <ulf.hansson@xxxxxxxxxx>; Sascha
> > > Hauer <s.hauer@xxxxxxxxxxxxxx>; Pengutronix Kernel Team
> > > <kernel@xxxxxxxxxxxxxx>; dl-linux-imx <linux-imx@xxxxxxx>; Cale
> > > Collins <ccollins@xxxxxxxxxxxxx>
> > > Subject: IMX8MM eMMC CQHCI timeout
> > >
> > > Greetings,
> > >
> > > I've encountered the following MMC CQHCI timeout message a couple of
> > > times now on IMX8MM boards with eMMC with a 5.10 based kernel:
> > >
> > > [  224.356283] mmc2: cqhci: ============ CQHCI REGISTER DUMP
> > > ===========
> > > [  224.362764] mmc2: cqhci: Caps:      0x0000310a | Version:
> > > 0x00000510
> > > [  224.369250] mmc2: cqhci: Config:    0x00001001 | Control:
> 0x00000000
> > > [  224.375726] mmc2: cqhci: Int stat:  0x00000000 | Int enab:
> 0x00000006
> > > [  224.382197] mmc2: cqhci: Int sig:   0x00000006 | Int Coal:
> 0x00000000
> > > [  224.388665] mmc2: cqhci: TDL base:  0x8003f000 | TDL up32:
> 0x00000000
> > > [  224.395129] mmc2: cqhci: Doorbell:  0xbf01dfff | TCN:
> 0x00000000
> > > [  224.401598] mmc2: cqhci: Dev queue: 0x00000000 | Dev Pend:
> 0x08000000
> > > [  224.408064] mmc2: cqhci: Task clr:  0x00000000 | SSC1:
> 0x00011000
> > > [  224.414532] mmc2: cqhci: SSC2:      0x00000001 | DCMD rsp:
> > > 0x00000800
> > > [  224.420997] mmc2: cqhci: RED mask:  0xfdf9a080 | TERRI:
> > > 0x00000000
> > > [  224.427467] mmc2: cqhci: Resp idx:  0x0000000d | Resp arg:
> > > 0x00000000 [  224.433934] mmc2: sdhci: ============ SDHCI REGISTER
> > > DUMP =========== [  224.440404] mmc2: sdhci: Sys addr:  0x7c722000
> | Version:
> > > 0x00000002 [  224.446877] mmc2: sdhci: Blk size:  0x00000200 | Blk
cnt:
> > > 0x00000020 [  224.453346] mmc2: sdhci: Argument:  0x00018000 | Trn
> > > mode: 0x00000023
> > > [  224.459811] mmc2: sdhci: Present:   0x01f88008 | Host ctl:
> 0x00000030
> > > [  224.466281] mmc2: sdhci: Power:     0x00000002 | Blk gap:
> > > 0x00000080
> > > [  224.472752] mmc2: sdhci: Wake-up:   0x00000008 | Clock:
> > > 0x0000000f
> > > [  224.479225] mmc2: sdhci: Timeout:   0x0000008f | Int stat:
> 0x00000000
> > > [  224.485690] mmc2: sdhci: Int enab:  0x107f4000 | Sig enab:
> > > 0x107f4000 [  224.492161] mmc2: sdhci: ACmd stat: 0x00000000 | Slot
int:
> 0x00000502
> > > [  224.498628] mmc2: sdhci: Caps:      0x07eb0000 | Caps_1:
> > > 0x8000b407
> > > [  224.505097] mmc2: sdhci: Cmd:       0x00000d1a | Max curr:
> 0x00ffffff
> > > [  224.511575] mmc2: sdhci: Resp[0]:   0x00000000 | Resp[1]:
> 0xffc003ff
> > > [  224.518043] mmc2: sdhci: Resp[2]:   0x328f5903 | Resp[3]:
> 0x00d07f01
> > > [  224.524512] mmc2: sdhci: Host ctl2: 0x00000088 [  224.528986]
> mmc2:
> > > sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0xfe179020 [  224.535451]
> > > mmc2: sdhci-esdhc-imx: ========= ESDHC IMX DEBUG STATUS DUMP
> ==== [
> > > 224.543052] mmc2: sdhci-esdhc-imx: cmd debug status:  0x2120 [
> > > 224.548740] mmc2: sdhci-esdhc-imx: data debug status:  0x2200 [
> > > 224.554510] mmc2: sdhci-esdhc-imx: trans debug status:  0x2300 [
> > > 224.560368] mmc2: sdhci-esdhc-imx: dma debug status:  0x2400 [
> > > 224.566054] mmc2: sdhci-esdhc-imx: adma debug status:  0x2510 [
> > > 224.571826] mmc2: sdhci-esdhc-imx: fifo debug status:  0x2680 [
> > > 224.577608] mmc2: sdhci-esdhc-imx: async fifo debug status:  0x2750
> > > [  224.583900] mmc2: sdhci:
> > > ============================================
> > >
> > > I don't know how to make the issue occur, both times it occured
> > > simply
> > reading
> > > a file in the rootfs ext4 fs on the emmc.
> > >
> > > Some research shows:
> > > -
> > > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fco
> > > mmu
> > >
> nity.nxp.com%2Ft5%2Fi-MX-Processors%2FThe-issues-on-quot-mmc0-cqhci-
> > > tim
> > >
> eout-for-tag-0-quot%2Fm-p%2F993779&amp;data=04%7C01%7Chaibo.chen%4
> > >
> 0nxp.com%7C1dc0981634f5460a779808d99b1d5a88%7C686ea1d3bc2b4c6fa9
> > >
> 2cd99c5c301635%7C0%7C0%7C637711372651089473%7CUnknown%7CTWFp
> > >
> bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > >
> 6Mn0%3D%7C1000&amp;sdata=ITcs7%2FMy%2F1Vx1TMB2VlaY4QhibKuSFBD
> > > 6UZhzVFl%2FqY%3D&amp;reserved=0
> > > -
> > > https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgit
> > > .torad%2F&amp;data=04%7C01%7Chaibo.chen%40nxp.com%7C281983c39
> 6a442e7
> > >
> 8d2108d99ee9f858%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6
> 37715
> > >
> 549993442194%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQ
> IjoiV2l
> > >
> uMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CyMZIUVjzXj
> 2tD3
> > > MfO4kUAOXr5SazgtJSRlhro9wOvU%3D&amp;reserved=0
> > >
> ex.com%2Fcgit%2Flinux-toradex.git%2Fcommit%2F%3Fh%3Dtoradex_5.4-2.3.
> > > x
> -imx%26id%3Dfd33531be843566c59a5fc655f204bbd36d7f3c6&amp;data=04%
> > >
> 7C01%7Chaibo.chen%40nxp.com%7C1dc0981634f5460a779808d99b1d5a88%
> > >
> 7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637711372651089473
> > > %7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> iLCJ
> > >
> BTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=xaamzPb2CdW6YDzW
> > > g8uBb0PjomkoWAziu5qglvMbT2I%3D&amp;reserved=0
> > >
> > > I'm not clear if this info is up-to-date. The NXP 5.4 kernel did not
> > enable this
> > > feature but if I'm not mistaken CQHCI support itself didn't land in
> > mainline until
> > > a later kernel so it would make sense it was not enabled at that
> > > time. I
> > do see
> > > the NXP 5.10 kernels have this enabled so I'm curious if it is an
> > > issue
> > there.
> > >
> > > Any other IMX8MM or other SoC users know what this could be about or
> > > what
> > I
> > > could do for a test to try to reproduce it so I can see if it occurs
> > > in
> > other kernel
> > > versions?
> >
> > Hi Tim,
> >
> > I'm debugging this issue those days, but unfortunately, still not find
> > the root cause.
> > The register value of Doorbell, Dev Queue, Dev Pend seems abnormal.
> > This issue happens on all i.MX SoC which support cmdq feature when cpu
> > loading is high.. Now I lack a mmc logic analyzer, make it not easy to
> > debug this issue. So stll need some time. Sorry about that.
> > If you want to make mmc work stable, you can disable the cmdq as a
> > workaround.
> >
> > Best Regards
> > Haibo Chen
> 
> Haibo,
> 
> Thanks for the information. Do you know how to easily reproduce it
reliably for
> testing?

Still not, can only meet this issue randomly after few hours stress test
under high CPU loading.

My next step is :
1, find a way to reproduce this issue easily
2, get emmc logic analyzer.


> 
> I have tried the following on an eMMC filesystem:
> stress --cpu 32 --io 32 &
> dd if=/dev/zero of=foo bs=1M count=1000 & dd if=/dev/zero of=foo bs=1M
> count=1000 & rm foo
> 
> I'm unable to reproduce the issue that way, and it has only happened
randomly
> once or twice.
> 
> Perhaps we should disable CMDQ for now until you can sort this out? I can
> submit a patch for that.

Yes, please.

Best Regards
Haibo Chen
> 
> Best regards,
> 
> Tim

Attachment: smime.p7s
Description: S/MIME cryptographic signature


[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux