Hi Bao D., We have done some test based on your RFC v3 patches and an issue is reported. kworker/u16:4: BUG: scheduling while atomic: kworker/u16:4/5736/0x00000002 kworker/u16:4: [name:core&]Preemption disabled at: kworker/u16:4: [<ffffffef97e33024>] ufshcd_mcq_sq_cleanup+0x9c/0x27c kworker/u16:4: CPU: 2 PID: 5736 Comm: kworker/u16:4 Tainted: G S W OE kworker/u16:4: Workqueue: ufs_eh_wq_0 ufshcd_err_handler kworker/u16:4: Call trace: kworker/u16:4: dump_backtrace+0x108/0x15c kworker/u16:4: show_stack+0x20/0x30 kworker/u16:4: dump_stack_lvl+0x6c/0x8c kworker/u16:4: dump_stack+0x20/0x44 kworker/u16:4: __schedule_bug+0xd4/0x100 kworker/u16:4: __schedule+0x660/0xa5c kworker/u16:4: schedule+0x80/0xec kworker/u16:4: schedule_hrtimeout_range_clock+0xa0/0x140 kworker/u16:4: schedule_hrtimeout_range+0x1c/0x30 kworker/u16:4: usleep_range_state+0x88/0xd8 kworker/u16:4: ufshcd_mcq_sq_cleanup+0x170/0x27c kworker/u16:4: ufshcd_clear_cmds+0x78/0x184 kworker/u16:4: ufshcd_wait_for_dev_cmd+0x234/0x348 kworker/u16:4: ufshcd_exec_dev_cmd+0x220/0x298 kworker/u16:4: ufshcd_verify_dev_init+0x68/0x124 kworker/u16:4: ufshcd_probe_hba+0x390/0x9bc kworker/u16:4: ufshcd_host_reset_and_restore+0x74/0x158 kworker/u16:4: ufshcd_reset_and_restore+0x70/0x31c kworker/u16:4: ufshcd_err_handler+0xad4/0xe58 kworker/u16:4: process_one_work+0x214/0x5b8 kworker/u16:4: worker_thread+0x2d4/0x448 kworker/u16:4: kthread+0x110/0x1e0 kworker/u16:4: ret_from_fork+0x10/0x20 kworker/u16:4: ------------[ cut here ]------------ On Wed, 2023-03-29 at 03:01 -0700, Bao D. Nguyen wrote: > +/** > + * ufshcd_mcq_sq_cleanup - Clean up Submission Queue resources > + * associated with the pending command. > + * @hba - per adapter instance. > + * @task_tag - The command's task tag. > + * @result - Result of the Clean up operation. > + * > + * Returns 0 and result on completion. Returns error code if > + * the operation fails. > + */ > +int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag, int > *result) > +{ > + struct ufshcd_lrb *lrbp = &hba->lrb[task_tag]; > + struct scsi_cmnd *cmd = lrbp->cmd; > + struct ufs_hw_queue *hwq; > + void __iomem *reg, *opr_sqd_base; > + u32 nexus, i, val; > + int err; > + > + if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) { > + if (!cmd) > + return FAILED; > + hwq = ufshcd_mcq_req_to_hwq(hba, > scsi_cmd_to_rq(cmd)); > + } else { > + hwq = hba->dev_cmd_queue; > + } > + > + i = hwq->id; > + > + spin_lock(&hwq->sq_lock); As spin_lock() disable preemption > + > + /* stop the SQ fetching before working on it */ > + err = ufshcd_mcq_sq_stop(hba, hwq); > + if (err) > + goto unlock; > + > + /* SQCTI = EXT_IID, IID, LUN, Task Tag */ > + nexus = lrbp->lun << 8 | task_tag; > + opr_sqd_base = mcq_opr_base(hba, OPR_SQD, i); > + writel(nexus, opr_sqd_base + REG_SQCTI); > + > + /* SQRTCy.ICU = 1 */ > + writel(SQ_ICU, opr_sqd_base + REG_SQRTC); > + > + /* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */ > + reg = opr_sqd_base + REG_SQRTS; > + err = read_poll_timeout(readl, val, val & SQ_CUS, 20, > + MCQ_POLL_US, false, reg); read_poll_timeout() was ufshcd_mcq_poll_register() in last patch, right? ufshcd_mcq_poll_register() calls usleep_range() causing KE as reported above. Same issue seems to still exist as read_poll_timeout() sleeps. Skipping ufshcd_mcq_sq_cleanup() by returning FAILED directly to trigger reset in ufshcd error handler successfully recover host. BTW, is there maybe a change list between RFC v3 and this v1 patch? :) Thanks Po-Wen