Re: [PATCH v1 1/5] ufs: mcq: Add supporting functions for mcq abort

"Bao D. Nguyen" <quic_nguyenb@xxxxxxxxxxx> · Wed, 12 Apr 2023 23:08:31 -0700

On 4/11/2023 6:14 AM, Powen Kao (高伯文) wrote:
Hi Bao D.,

We have done some test based on your RFC v3 patches and an issue is
reported.

Hi Powen,

Thank you very much for catching this issue. It seems that I cannot use 
read_poll_timeout() because it sleeps while holding the spin_lock().

I will make the change to poll the registers in a tight loop with 
udelay(20) polling interval in the next revision.

Thanks,
Bao

    kworker/u16:4: BUG: scheduling while atomic:
    kworker/u16:4/5736/0x00000002
    kworker/u16:4: [name:core&]Preemption disabled at:
    kworker/u16:4: [<ffffffef97e33024>] ufshcd_mcq_sq_cleanup+0x9c/0x27c
    kworker/u16:4: CPU: 2 PID: 5736 Comm: kworker/u16:4 Tainted: G
    S      W  OE
    kworker/u16:4: Workqueue: ufs_eh_wq_0 ufshcd_err_handler
    kworker/u16:4: Call trace:
    kworker/u16:4:  dump_backtrace+0x108/0x15c
    kworker/u16:4:  show_stack+0x20/0x30
    kworker/u16:4:  dump_stack_lvl+0x6c/0x8c
    kworker/u16:4:  dump_stack+0x20/0x44
    kworker/u16:4:  __schedule_bug+0xd4/0x100
    kworker/u16:4:  __schedule+0x660/0xa5c
    kworker/u16:4:  schedule+0x80/0xec
    kworker/u16:4:  schedule_hrtimeout_range_clock+0xa0/0x140
    kworker/u16:4:  schedule_hrtimeout_range+0x1c/0x30
    kworker/u16:4:  usleep_range_state+0x88/0xd8
    kworker/u16:4:  ufshcd_mcq_sq_cleanup+0x170/0x27c
    kworker/u16:4:  ufshcd_clear_cmds+0x78/0x184
    kworker/u16:4:  ufshcd_wait_for_dev_cmd+0x234/0x348
    kworker/u16:4:  ufshcd_exec_dev_cmd+0x220/0x298
    kworker/u16:4:  ufshcd_verify_dev_init+0x68/0x124
    kworker/u16:4:  ufshcd_probe_hba+0x390/0x9bc
    kworker/u16:4:  ufshcd_host_reset_and_restore+0x74/0x158
    kworker/u16:4:  ufshcd_reset_and_restore+0x70/0x31c
    kworker/u16:4:  ufshcd_err_handler+0xad4/0xe58
    kworker/u16:4:  process_one_work+0x214/0x5b8
    kworker/u16:4:  worker_thread+0x2d4/0x448
    kworker/u16:4:  kthread+0x110/0x1e0
    kworker/u16:4:  ret_from_fork+0x10/0x20
    kworker/u16:4: ------------[ cut here ]------------

On Wed, 2023-03-29 at 03:01 -0700, Bao D. Nguyen wrote:

+/**
+ * ufshcd_mcq_sq_cleanup - Clean up Submission Queue resources
+ * associated with the pending command.
+ * @hba - per adapter instance.
+ * @task_tag - The command's task tag.
+ * @result - Result of the Clean up operation.
+ *
+ * Returns 0 and result on completion. Returns error code if
+ * the operation fails.
+ */
+int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag, int
*result)
+{
+       struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+       struct scsi_cmnd *cmd = lrbp->cmd;
+       struct ufs_hw_queue *hwq;
+       void __iomem *reg, *opr_sqd_base;
+       u32 nexus, i, val;
+       int err;
+
+       if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
+               if (!cmd)
+                       return FAILED;
+               hwq = ufshcd_mcq_req_to_hwq(hba,
scsi_cmd_to_rq(cmd));
+       } else {
+               hwq = hba->dev_cmd_queue;
+       }
+
+       i = hwq->id;
+
+       spin_lock(&hwq->sq_lock);
As spin_lock() disable preemption

+
+       /* stop the SQ fetching before working on it */
+       err = ufshcd_mcq_sq_stop(hba, hwq);
+       if (err)
+               goto unlock;
+
+       /* SQCTI = EXT_IID, IID, LUN, Task Tag */
+       nexus = lrbp->lun << 8 | task_tag;
+       opr_sqd_base = mcq_opr_base(hba, OPR_SQD, i);
+       writel(nexus, opr_sqd_base + REG_SQCTI);
+
+       /* SQRTCy.ICU = 1 */
+       writel(SQ_ICU, opr_sqd_base + REG_SQRTC);
+
+       /* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */
+       reg = opr_sqd_base + REG_SQRTS;
+       err = read_poll_timeout(readl, val, val & SQ_CUS, 20,
+                               MCQ_POLL_US, false, reg);
read_poll_timeout() was ufshcd_mcq_poll_register() in last patch,
right? ufshcd_mcq_poll_register() calls usleep_range() causing KE as
reported above. Same issue seems to still exist as read_poll_timeout()
sleeps.

Skipping ufshcd_mcq_sq_cleanup() by returning FAILED directly to
trigger reset in ufshcd error handler successfully recover host.

BTW, is there maybe a change list between RFC v3 and this v1 patch? :)
Thanks

Po-Wen