On 2021-01-08 02:53, Jaegeuk Kim wrote:
From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
When non-fatal error like line-reset happens, ufshcd_err_handler()
starts to
abort tasks by ufshcd_try_to_abort_task(). When it tries to issue tm
request,
we've hit two warnings.
WARNING: CPU: 7 PID: 7 at block/blk-core.c:630
blk_get_request+0x68/0x70
WARNING: CPU: 4 PID: 157 at block/blk-mq-tag.c:82
blk_mq_get_tag+0x438/0x46c
After fixing the above warnings, I've hit another tm_cmd timeout, which
may be
caused by unstable controller state.
__ufshcd_issue_tm_cmd: task management cmd 0x80 timed-out
Then, ufshcd_err_handler() enters full reset, and I hit kernel stuck.
It turned
out ufshcd_print_trs() printed too many messages in console which
requires CPU
locks. Likewise hba->silence_err_logs, we need to avoid too verbose
messages.
Actually it came from ufshcd_transfer_rsp_status() when requeuing
commands back.
Indeed, this is actually not an error case, so let's fix it.
Fixes: 69a6c269c097 ("scsi: ufs: Use blk_{get,put}_request() to
allocate and free TMFs")
Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx>
It is really good to find out the root cause! Thanks for the fix.
Reviewed-by: Can Guo <cang@xxxxxxxxxxxxxx>
---
drivers/scsi/ufs/ufshcd.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index e6e7bdf99cd7..2a715f13fe1d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4996,7 +4996,8 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba,
struct ufshcd_lrb *lrbp)
break;
} /* end of switch */
- if ((host_byte(result) != DID_OK) && !hba->silence_err_logs)
+ if ((host_byte(result) != DID_OK) &&
+ (host_byte(result) != DID_REQUEUE) && !hba->silence_err_logs)
ufshcd_print_trs(hba, 1 << lrbp->task_tag, true);
return result;
}
@@ -6302,9 +6303,13 @@ static irqreturn_t ufshcd_intr(int irq, void
*__hba)
intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
}
- if (enabled_intr_status && retval == IRQ_NONE) {
- dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x\n",
- __func__, intr_status);
+ if (enabled_intr_status && retval == IRQ_NONE &&
+ !ufshcd_eh_in_progress(hba)) {
+ dev_err(hba->dev, "%s: Unhandled interrupt 0x%08x (0x%08x,
0x%08x)\n",
+ __func__,
+ intr_status,
+ hba->ufs_stats.last_intr_status,
+ enabled_intr_status);
ufshcd_dump_regs(hba, 0, UFSHCI_REG_SPACE_SIZE, "host_regs: ");
}
@@ -6348,7 +6353,10 @@ static int __ufshcd_issue_tm_cmd(struct ufs_hba
*hba,
* Even though we use wait_event() which sleeps indefinitely,
* the maximum wait time is bounded by %TM_CMD_TIMEOUT.
*/
- req = blk_get_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);
+ req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+
req->end_io_data = &wait;
free_slot = req->tag;
WARN_ON_ONCE(free_slot < 0 || free_slot >= hba->nutmrs);