On 4/11/2023 6:14 AM, Powen Kao (高伯文) wrote:
Hi Bao D.,
We have done some test based on your RFC v3 patches and an issue is
reported.
Hi Powen,
Thank you very much for catching this issue. It seems that I cannot use
read_poll_timeout() because it sleeps while holding the spin_lock().
I will make the change to poll the registers in a tight loop with
udelay(20) polling interval in the next revision.
Thanks,
Bao
kworker/u16:4: BUG: scheduling while atomic:
kworker/u16:4/5736/0x00000002
kworker/u16:4: [name:core&]Preemption disabled at:
kworker/u16:4: [<ffffffef97e33024>] ufshcd_mcq_sq_cleanup+0x9c/0x27c
kworker/u16:4: CPU: 2 PID: 5736 Comm: kworker/u16:4 Tainted: G
S W OE
kworker/u16:4: Workqueue: ufs_eh_wq_0 ufshcd_err_handler
kworker/u16:4: Call trace:
kworker/u16:4: dump_backtrace+0x108/0x15c
kworker/u16:4: show_stack+0x20/0x30
kworker/u16:4: dump_stack_lvl+0x6c/0x8c
kworker/u16:4: dump_stack+0x20/0x44
kworker/u16:4: __schedule_bug+0xd4/0x100
kworker/u16:4: __schedule+0x660/0xa5c
kworker/u16:4: schedule+0x80/0xec
kworker/u16:4: schedule_hrtimeout_range_clock+0xa0/0x140
kworker/u16:4: schedule_hrtimeout_range+0x1c/0x30
kworker/u16:4: usleep_range_state+0x88/0xd8
kworker/u16:4: ufshcd_mcq_sq_cleanup+0x170/0x27c
kworker/u16:4: ufshcd_clear_cmds+0x78/0x184
kworker/u16:4: ufshcd_wait_for_dev_cmd+0x234/0x348
kworker/u16:4: ufshcd_exec_dev_cmd+0x220/0x298
kworker/u16:4: ufshcd_verify_dev_init+0x68/0x124
kworker/u16:4: ufshcd_probe_hba+0x390/0x9bc
kworker/u16:4: ufshcd_host_reset_and_restore+0x74/0x158
kworker/u16:4: ufshcd_reset_and_restore+0x70/0x31c
kworker/u16:4: ufshcd_err_handler+0xad4/0xe58
kworker/u16:4: process_one_work+0x214/0x5b8
kworker/u16:4: worker_thread+0x2d4/0x448
kworker/u16:4: kthread+0x110/0x1e0
kworker/u16:4: ret_from_fork+0x10/0x20
kworker/u16:4: ------------[ cut here ]------------
On Wed, 2023-03-29 at 03:01 -0700, Bao D. Nguyen wrote:
+/**
+ * ufshcd_mcq_sq_cleanup - Clean up Submission Queue resources
+ * associated with the pending command.
+ * @hba - per adapter instance.
+ * @task_tag - The command's task tag.
+ * @result - Result of the Clean up operation.
+ *
+ * Returns 0 and result on completion. Returns error code if
+ * the operation fails.
+ */
+int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba, int task_tag, int
*result)
+{
+ struct ufshcd_lrb *lrbp = &hba->lrb[task_tag];
+ struct scsi_cmnd *cmd = lrbp->cmd;
+ struct ufs_hw_queue *hwq;
+ void __iomem *reg, *opr_sqd_base;
+ u32 nexus, i, val;
+ int err;
+
+ if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
+ if (!cmd)
+ return FAILED;
+ hwq = ufshcd_mcq_req_to_hwq(hba,
scsi_cmd_to_rq(cmd));
+ } else {
+ hwq = hba->dev_cmd_queue;
+ }
+
+ i = hwq->id;
+
+ spin_lock(&hwq->sq_lock);
As spin_lock() disable preemption
+
+ /* stop the SQ fetching before working on it */
+ err = ufshcd_mcq_sq_stop(hba, hwq);
+ if (err)
+ goto unlock;
+
+ /* SQCTI = EXT_IID, IID, LUN, Task Tag */
+ nexus = lrbp->lun << 8 | task_tag;
+ opr_sqd_base = mcq_opr_base(hba, OPR_SQD, i);
+ writel(nexus, opr_sqd_base + REG_SQCTI);
+
+ /* SQRTCy.ICU = 1 */
+ writel(SQ_ICU, opr_sqd_base + REG_SQRTC);
+
+ /* Poll SQRTSy.CUS = 1. Return result from SQRTSy.RTC */
+ reg = opr_sqd_base + REG_SQRTS;
+ err = read_poll_timeout(readl, val, val & SQ_CUS, 20,
+ MCQ_POLL_US, false, reg);
read_poll_timeout() was ufshcd_mcq_poll_register() in last patch,
right? ufshcd_mcq_poll_register() calls usleep_range() causing KE as
reported above. Same issue seems to still exist as read_poll_timeout()
sleeps.
Skipping ufshcd_mcq_sq_cleanup() by returning FAILED directly to
trigger reset in ufshcd error handler successfully recover host.
BTW, is there maybe a change list between RFC v3 and this v1 patch? :)
Thanks
Po-Wen