Re: [BUG] mmc: core: adjust polling interval for CMD1

Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx> · Wed, 2 Mar 2022 09:20:34 +0100

Ulf Hansson wrote on Tue  1/03/22 14:38:
> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> >
> > Hi,
> > Jean Rene Dawin did report to me a problem on the Beagle Bone Black starting
> > with our disto kernel based on v5.17-rc1:
> >
> > >> since kernel 5.17-rc1 I noticed slower emmc performance on Beaglebone
> > >> Black, but didn't check the logs.
> > >> When I tried to run 5.17.0-rc3-letux+ it booted fine, but during IO
> > >> traffic there were messages like
> > >>
> > >> [  662.529584] mmc1: error -110 doing runtime resume
> > >> [  669.293590] mmc1: Card stuck being busy! __mmc_poll_for_busy
> > >>
> > >> [  739.076072] mmc1: Timeout waiting for hardware interrupt.
> > >> [  739.145676] mmc1: sdhci: ============ SDHCI REGISTER DUMP ===========
> > >> [  739.231053] mmc1: sdhci: Sys addr:  0x00000000 | Version:  0x00003101
> > >> [  739.316472] mmc1: sdhci: Blk size:  0x00000200 | Blk cnt:  0x00000400
> > >> [  739.401937] mmc1: sdhci: Argument:  0x00342d30 | Trn mode: 0x00000023
> > >> [  739.487439] mmc1: sdhci: Present:   0x01f70000 | Host ctl: 0x00000000
> > >> [  739.573007] mmc1: sdhci: Power:     0x0000000f | Blk gap:  0x00000000
> > >> [  739.658609] mmc1: sdhci: Wake-up:   0x00000000 | Clock:    0x00003c07
> > >> [  739.744224] mmc1: sdhci: Timeout:   0x00000007 | Int stat: 0x00000002
> > >> [  739.829896] mmc1: sdhci: Int enab:  0x027f000b | Sig enab: 0x027f000b
> > >> [  739.915623] mmc1: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> > >> [  740.001394] mmc1: sdhci: Caps:      0x07e10080 | Caps_1:   0x00000000
> > >> [  740.087208] mmc1: sdhci: Cmd:       0x0000193a | Max curr: 0x00000000
> > >> [  740.173051] mmc1: sdhci: Resp[0]:   0x00000900 | Resp[1]:  0x00000000
> > >> [  740.258928] mmc1: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
> > >> [  740.344854] mmc1: sdhci: Host ctl2: 0x00000000
> > >> [  740.402796] mmc1: sdhci: ============================================
> > >>
> > >> and finally IO errors and a corrupted filesystem.
> > >>
> > >> 5.17.0-rc4-letux+ shows the same behaviour.
> >
> > I checked with my BeagleBoard Black (am3358) and can confirm this observation.
> > It happens only with the integrated eMMC but not with the µSD connected to
> > the other mmc interface.
> >
> > A git bisect found:
> >
> > 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722 is the first bad commit
> > commit 76bfc7ccc2fa9d382576f6013b57a0ef93d5a722
> > Author: Huijin Park <huijin.park@xxxxxxxxxxx>
> > Date:   Thu Nov 4 15:32:31 2021 +0900
> >
> >   mmc: core: adjust polling interval for CMD1
> >
> >   In mmc_send_op_cond(), loops are continuously performed at the same
> >   interval of 10 ms.  However the behaviour is not good for some eMMC
> >   which can be out from a busy state earlier than 10 ms if normal.
> >
> >   Rather than fixing about the interval time in mmc_send_op_cond(),
> >   let's instead convert into using the common __mmc_poll_for_busy().
> >
> >   The reason for adjusting the interval time is that it is important
> >   to reduce the eMMC initialization time, especially in devices that
> >   use eMMC as rootfs.
> >
> >   Test log(eMMC:KLM8G1GETF-B041):
> >
> >   before: 12 ms (0.311016 - 0.298729)
> >   [    0.295823] mmc0: starting CMD0 arg 00000000 flags 000000c0
> >   [    0.298729] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> >   [    0.311016] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> >   [    0.311336] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> >   after: 2 ms (0.301270 - 0.298762)
> >   [    0.295862] mmc0: starting CMD0 arg 00000000 flags 000000c0
> >   [    0.298762] mmc0: starting CMD1 arg 40000080 flags 000000e1<-start
> >   [    0.299067] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.299441] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.299879] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.300446] mmc0: starting CMD1 arg 40000080 flags 000000e1
> >   [    0.301270] mmc0: starting CMD1 arg 40000080 flags 000000e1<-finish
> >   [    0.301572] mmc0: starting CMD2 arg 00000000 flags 00000007
> >
> >   Signed-off-by: Huijin Park <huijin.park@xxxxxxxxxxx>
> >   Link: https://lore.kernel.org/r/20211104063231.2115-3-huijin.park@xxxxxxxxxxx
> >   Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >
> > Reverting this makes v5.17-rc[1-4] work.
> >
> > Any suggestions or fixes?
> >
> > BR and thanks,
> > Nikolaus Schaller
> >
> > Reported-by: jdawin@xxxxxxxxxxxxxxxxxxxxx
> >
> 
> Thanks for reporting and bisecting!
> 
> What changed with the offending commit is two things:
> 
> 1) We are sending the CMD1 more frequently, initially in the loop in
> the __mmc_poll_for_busy. Step by step, we increase the polling period.
> 2) We may end up using a slightly shorter total timeout for polling
> time, compared to what we used before the offending commit.
> 
> Hopefully the problem is related to 2), in which case I think the
> below patch should help. Can you please give it a try?
> 
> Kind regards
> Uffe
> 
> From: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> Date: Tue, 1 Mar 2022 14:24:21 +0100
> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> 
> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> which is probably more inline with its previous value, to fix the reported
> timeout problems.
> 
> While at it, let's add a define for the timeout value, rather than using
> a hard-coded value for it.
> 
> Reported-by: Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>
> Reported-by: H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>
> Cc: Huijin Park <huijin.park@xxxxxxxxxxx>
> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> ---
>  drivers/mmc/core/mmc_ops.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..1f57174b3cf3 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
> 
>  #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
>  #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
> 
>  static const u8 tuning_blk_pattern_4bit[] = {
>         0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> ocr, u32 *rocr)
>         cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>         cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> 
> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> +                                 &__mmc_send_op_cond_cb, &cb_data);
>         if (err)
>                 return err;
> 
> -- 
> 2.25.1

Hi,

thanks. But testing with this patch still gives the same errors:

[   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
[   52.273380] mmc1: error -110 doing runtime resume

and the system gets stuck eventually.

Regards,
Jean Rene Dawin