+ Adrian On Tue, 16 Feb 2021 at 23:43, Mårten Lindahl <marten.lindahl@xxxxxxxx> wrote: > > Sometimes SD cards that has been run for a long time enters a state > where it cannot by itself be recovered, but needs a power cycle to be > operational again. Card status analysis has indicated that the card can > end up in a state where all external commands are ignored by the card > since it is halted by data timeouts. > > If the card has been heavily used for a long time it can be weared out, > and should typically be replaced. But on some tests, it shows that the > card can still be functional after a power cycle, but as it requires an > operator to do it, the card can remain in a non-operational state for a > long time until the problem has been observed by the operator. > > This patch adds function to power cycle the card in case it does not > respond to a command, and then resend the command if the power cycle > was successful. This procedure will be tested 1 time before giving up, > and resuming host operation as normal. I assume the context above is all about the ioctl interface? So, when the card enters this non functional state, have you tried just reading a block through the regular I/O interface. Does it trigger a power cycle of the card - and then makes it functional again? > > Signed-off-by: Mårten Lindahl <marten.lindahl@xxxxxxxx> > --- > Please note: This might not be the way we want to handle these cases, > but at least it lets us start the discussion. In which cases should the > mmc framework deal with error messages like ETIMEDOUT, and in which > cases should it be handled by userspace? > The mmc framework tries to recover a failed block request > (mmc_blk_mq_rw_recovery) which may end up in a HW reset of the card. > Would it be an idea to act in a similar way when an ioctl times out? Maybe, it's a good idea to allow the similar reset for ioctls as we do for regular I/O requests. My concern with this though, is that we might allow user space to trigger a HW resets a bit too easily - and that could damage the card. Did you consider this? > > drivers/mmc/core/block.c | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index 42e27a298218..d007b2af64d6 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -976,6 +976,7 @@ static inline void mmc_blk_reset_success(struct mmc_blk_data *md, int type) > */ > static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > { > + int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE; > struct mmc_queue_req *mq_rq; > struct mmc_card *card = mq->card; > struct mmc_blk_data *md = mq->blkdata; > @@ -983,7 +984,7 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > bool rpmb_ioctl; > u8 **ext_csd; > u32 status; > - int ret; > + int ret, retry = 1; > int i; > > mq_rq = req_to_mmc_queue_req(req); > @@ -994,9 +995,24 @@ static void mmc_blk_issue_drv_op(struct mmc_queue *mq, struct request *req) > case MMC_DRV_OP_IOCTL_RPMB: > idata = mq_rq->drv_op_data; > for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) { > +cmd_do: > ret = __mmc_blk_ioctl_cmd(card, md, idata[i]); > - if (ret) > + if (ret == -ETIMEDOUT) { > + dev_warn(mmc_dev(card->host), > + "error %d sending command\n", ret); > +cmd_reset: > + mmc_blk_reset_success(md, type); > + if (retry--) { > + dev_warn(mmc_dev(card->host), > + "power cycling card\n"); > + if (mmc_blk_reset > + (md, card->host, type)) > + goto cmd_reset; > + mmc_blk_reset_success(md, type); > + goto cmd_do; > + } > break; > + } > } > /* Always switch back to main area after RPMB access */ > if (rpmb_ioctl) > -- > 2.11.0 > Kind regards Uffe