Re: [BUG] mmc: core: adjust polling interval for CMD1

Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx> · Fri, 4 Mar 2022 10:42:19 +0100

Huijin Park wrote on Thu  3/03/22 21:16:
> Hi, sorry for the late reply.
> I guess the problem occurs depending on the eMMC model.
> Because I tested again and there was no problem.
> The eMMC model used for the test are as follows.
> (THGBMJG6C1LBAIL, KLM8G1GETF-B041)
> Anyway I guess the cause is interval time.
> I wrote a debug patch assuming that the reason was that some mmc models
> could not process CMD1 delivered at short intervals.
> I copied the polling function and adds interval minimum time parameter.
> I set the minimum time to 1ms. You can adjust it.
> Please test if there is no problem with the debug patch.
> 
> >Hi,
> >
> >> Am 02.03.2022 um 09:20 schrieb Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>:
> >> 
> >> Ulf Hansson wrote on Tue  1/03/22 14:38:
> >>> On Thu, 17 Feb 2022 at 21:12, H. Nikolaus Schaller <hns@xxxxxxxxxxxxx> wrote:
> >>>> 
> >>> 
> >>> From: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >>> Date: Tue, 1 Mar 2022 14:24:21 +0100
> >>> Subject: [PATCH] mmc: core: Extend timeout to 2s for MMC_SEND_OP_COND
> >>> 
> >>> It looks like the timeout for the MMC_SEND_OP_COND (CMD1) might have become
> >>> a bit too small due to recent changes. Therefore, let's extend it to 2s,
> >>> which is probably more inline with its previous value, to fix the reported
> >>> timeout problems.
> >>> 
> >>> While at it, let's add a define for the timeout value, rather than using
> >>> a hard-coded value for it.
> >>> 
> >>> Reported-by: Jean Rene Dawin <jdawin@xxxxxxxxxxxxxxxxxxxxx>
> >>> Reported-by: H. Nikolaus Schaller <hns@xxxxxxxxxxxxx>
> >>> Cc: Huijin Park <huijin.park@xxxxxxxxxxx>
> >>> Fixes: 76bfc7ccc2fa ("mmc: core: adjust polling interval for CMD1")
> >>> Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >>> ---
> >>> drivers/mmc/core/mmc_ops.c | 4 +++-
> >>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>> 
> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >>> index d63d1c735335..1f57174b3cf3 100644
> >>> --- a/drivers/mmc/core/mmc_ops.c
> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >>> @@ -21,6 +21,7 @@
> >>> 
> >>> #define MMC_BKOPS_TIMEOUT_MS           (120 * 1000) /* 120s */
> >>> #define MMC_SANITIZE_TIMEOUT_MS                (240 * 1000) /* 240s */
> >>> +#define MMC_OP_COND_TIMEOUT_MS         2000 /* 2s */
> >>> 
> >>> static const u8 tuning_blk_pattern_4bit[] = {
> >>>        0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> >>> @@ -232,7 +233,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32
> >>> ocr, u32 *rocr)
> >>>        cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
> >>>        cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
> >>> 
> >>> -       err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> >>> +       err = __mmc_poll_for_busy(host, MMC_OP_COND_TIMEOUT_MS,
> >>> +                                 &__mmc_send_op_cond_cb, &cb_data);
> >>>        if (err)
> >>>                return err;
> >>> 
> >>> -- 
> >>> 2.25.1
> >> 
> >> Hi,
> >> 
> >> thanks. But testing with this patch still gives the same errors:
> >> 
> >> [   52.259940] mmc1: Card stuck being busy! __mmc_poll_for_busy
> >> [   52.273380] mmc1: error -110 doing runtime resume
> >> 
> >> and the system gets stuck eventually.
> >
> >Same result from my tests.
> >
> >BR and thanks,
> >Nikolaus
> 
> 
> From c2458cb998dd8e275fefba52dd2532beb153c82d Mon Sep 17 00:00:00 2001
> From: Huijin Park <huijin.park@xxxxxxxxxxx>
> Date: Thu, 3 Mar 2022 20:43:22 +0900
> Subject: [PATCH] mmc: core: extend timeout and set min time for op_cond
> 
> This patch extends timeout to 2s and sets minimal interval time for
> checking op_cond stuck problem.
> 
> Signed-off-by: Huijin Park <huijin.park@xxxxxxxxxxx>
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index d63d1c735335..ccad6379d183 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -21,6 +21,7 @@
>  
>  #define MMC_BKOPS_TIMEOUT_MS		(120 * 1000) /* 120s */
>  #define MMC_SANITIZE_TIMEOUT_MS		(240 * 1000) /* 240s */
> +#define MMC_OP_COND_TIMEOUT_MS		(  2 * 1000) /*   2s */
>  
>  static const u8 tuning_blk_pattern_4bit[] = {
>  	0xff, 0x0f, 0xff, 0x00, 0xff, 0xcc, 0xc3, 0xcc,
> @@ -179,6 +180,47 @@ int mmc_go_idle(struct mmc_host *host)
>  	return err;
>  }
>  
> +static int ____mmc_poll_for_busy(struct mmc_host *host, unsigned int udelay_min,
> +				 unsigned int timeout_ms,
> +				 int (*busy_cb)(void *cb_data, bool *busy),
> +				 void *cb_data)
> +{
> +	int err;
> +	unsigned long timeout;
> +	unsigned int udelay = udelay_min, udelay_max = 32768;
> +	bool expired = false;
> +	bool busy = false;
> +
> +	timeout = jiffies + msecs_to_jiffies(timeout_ms) + 1;
> +	do {
> +		/*
> +		 * Due to the possibility of being preempted while polling,
> +		 * check the expiration time first.
> +		 */
> +		expired = time_after(jiffies, timeout);
> +
> +		err = (*busy_cb)(cb_data, &busy);
> +		if (err)
> +			return err;
> +
> +		/* Timeout if the device still remains busy. */
> +		if (expired && busy) {
> +			pr_err("%s: Card stuck being busy! %s\n",
> +				mmc_hostname(host), __func__);
> +			return -ETIMEDOUT;
> +		}
> +
> +		/* Throttle the polling rate to avoid hogging the CPU. */
> +		if (busy) {
> +			usleep_range(udelay, udelay * 2);
> +			if (udelay < udelay_max)
> +				udelay *= 2;
> +		}
> +	} while (busy);
> +
> +	return 0;
> +}
> +
>  static int __mmc_send_op_cond_cb(void *cb_data, bool *busy)
>  {
>  	struct mmc_op_cond_busy_data *data = cb_data;
> @@ -232,7 +274,8 @@ int mmc_send_op_cond(struct mmc_host *host, u32 ocr, u32 *rocr)
>  	cmd.arg = mmc_host_is_spi(host) ? 0 : ocr;
>  	cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R3 | MMC_CMD_BCR;
>  
> -	err = __mmc_poll_for_busy(host, 1000, &__mmc_send_op_cond_cb, &cb_data);
> +	err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
> +				    &__mmc_send_op_cond_cb, &cb_data);
>  	if (err)
>  		return err;
>  
> -- 
> 2.17.1
> 

Hi,

thanks for the patch. 
I started with a value of 10 ms :
       err = ____mmc_poll_for_busy(host, 10, MMC_OP_COND_TIMEOUT_MS,
                                    &__mmc_send_op_cond_cb, &cb_data);

and the result was agian

[   23.349932] mmc1: Card stuck being busy! __mmc_poll_for_busy
[   23.355936] mmc1: error -110 doing runtime resume

Same with 100 ms and 500 ms.

The messages seem to come from __mmc_poll_for_busy and not ____mmc_poll_for_busy.
Yet, changing the value in the call of ____mmc_poll_for_busy to 1000 ms changed the behaviour:

       err = ____mmc_poll_for_busy(host, 1000, MMC_OP_COND_TIMEOUT_MS,
                                   &__mmc_send_op_cond_cb, &cb_data);

With this I didn't get any "stuck" errors and the board seemed stable.

Regards,
Jean Rene Dawin