Re: [PATCH] mmc: Try power cycling card if command request times out

Ulf Hansson <ulf.hansson@xxxxxxxxxx> · Mon, 1 Mar 2021 09:50:56 +0100

+ Adrian

On Tue, 16 Feb 2021 at 23:43, Mårten Lindahl <marten.lindahl@xxxxxxxx> wrote:
>
> Sometimes SD cards that has been run for a long time enters a state
> where it cannot by itself be recovered, but needs a power cycle to be
> operational again. Card status analysis has indicated that the card can
> end up in a state where all external commands are ignored by the card
> since it is halted by data timeouts.
>
> If the card has been heavily used for a long time it can be weared out,
> and should typically be replaced. But on some tests, it shows that the
> card can still be functional after a power cycle, but as it requires an
> operator to do it, the card can remain in a non-operational state for a
> long time until the problem has been observed by the operator.
>
> This patch adds function to power cycle the card in case it does not
> respond to a command, and then resend the command if the power cycle
> was successful. This procedure will be tested 1 time before giving up,
> and resuming host operation as normal.

I assume the context above is all about the ioctl interface?

So, when the card enters this non functional state, have you tried
just reading a block through the regular I/O interface. Does it
trigger a power cycle of the card - and then makes it functional
again?

>
> Signed-off-by: Mårten Lindahl <marten.lindahl@xxxxxxxx>
> ---
> Please note: This might not be the way we want to handle these cases,
> but at least it lets us start the discussion. In which cases should the
> mmc framework deal with error messages like ETIMEDOUT, and in which
> cases should it be handled by userspace?
> The mmc framework tries to recover a failed block request
> (mmc_blk_mq_rw_recovery) which may end up in a HW reset of the card.
> Would it be an idea to act in a similar way when an ioctl times out?

Maybe, it's a good idea to allow the similar reset for ioctls as we do
for regular I/O requests. My concern with this though, is that we
might allow user space to trigger a HW resets a bit too easily - and
that could damage the card.

Did you consider this?

>
>  drivers/mmc/core/block.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 42e27a298218..d007b2af64d6 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -976,6 +976,7 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type)
>   */
>  static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
>  {
> +       int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
>         struct mmc_queue_req *mq_rq;
>         struct mmc_card *card = mq->card;
>         struct mmc_blk_data *md = mq->blkdata;
> @@ -983,7 +984,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
>         bool rpmb_ioctl;
>         u8 **ext_csd;
>         u32 status;
> -       int ret;
> +       int ret, retry = 1;
>         int i;
>
>         mq_rq = req_to_mmc_queue_req(req);
> @@ -994,9 +995,24 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req)
>         case MMC_DRV_OP_IOCTL_RPMB:
>                 idata = mq_rq->drv_op_data;
>                 for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) {
> +cmd_do:
>                         ret = __mmc_blk_ioctl_cmd(card, md, idata[i]);
> -                       if (ret)
> +                       if (ret == -ETIMEDOUT) {
> +                               dev_warn(mmc_dev(card->host),
> +                                        "error %d sending command\n", ret);
> +cmd_reset:
> +                               mmc_blk_reset_success(md, type);
> +                               if (retry--) {
> +                                       dev_warn(mmc_dev(card->host),
> +                                                "power cycling card\n");
> +                                       if (mmc_blk_reset
> +                                           (md, card->host, type))
> +                                               goto cmd_reset;
> +                                       mmc_blk_reset_success(md, type);
> +                                       goto cmd_do;
> +                               }
>                                 break;
> +                       }
>                 }
>                 /* Always switch back to main area after RPMB access */
>                 if (rpmb_ioctl)
> --
> 2.11.0
>

Kind regards
Uffe