Re: [PATCH v2 5/5] ufs: core: Add error handling for MCQ mode

Bart Van Assche <bvanassche@xxxxxxx> · Thu, 4 May 2023 11:04:58 -0700




On 5/3/23 21:18, Bao D. Nguyen wrote:
On 4/25/2023 5:21 PM, Bart Van Assche wrote:
On 4/17/23 14:05, Bao D. Nguyen wrote:
+        /* MCQ mode */
+        if (is_mcq_enabled(hba))
+            return ufshcd_clear_cmds(hba, 1UL << lrbp->task_tag);

The above code will trigger an overflow if lrbp->task_tag >= 8 * sizeof(unsigned long). That's not acceptable.
This ufshcd_clear_cmds() uses a bit map. There are multiple places in the UFS code have this limitation if the queue depth grows to be greater than 64. I am thinking:
1. Current ufs controllers in the market probably support queue depth 64 or less, so it may not be a problem today if host controller cap is set to 64 queue depth, but can be a problem in multiple places in the code later.
2. In mcq mode, we can pass a tag number into this api ufshcd_clear_cmds(); while in SDB mode, pass the tag's bit mask as before.
3. Use sbitmask() to support large queue depth? Thanks for any suggestions.

The UFS driver is the only block driver I know that tracks which commands
are pending in a bitmap. Please pass the lrbp pointer or the task_tag directly
to ufshcd_clear_cmds() instead of passing a bitmap to that function. Please
also introduce a loop in ufshcd_eh_device_reset_handler() around the
ufshcd_clear_cmds() call instead of passing a bitmap to ufshcd_clear_cmds().

  static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
  {
+    struct ufshcd_lrb *lrbp;
+    u32 hwq_num, utag;
+    int tag;
+
      /* Resetting interrupt aggregation counters first and reading the
       * DOOR_BELL afterward allows us to handle all the completed requests.
       * In order to prevent other interrupts starvation the DB is read once
@@ -5580,7 +5590,22 @@ static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
       * Ignore the ufshcd_poll() return value and return IRQ_HANDLED since we
       * do not want polling to trigger spurious interrupt complaints.
       */
-    ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT);
+    if (!is_mcq_enabled(hba)) {
+        ufshcd_poll(hba->host, UFSHCD_POLL_FROM_INTERRUPT_CONTEXT);
+        goto out;
+    }
+
+    /* MCQ mode */
+    for (tag = 0; tag < hba->nutrs; tag++) {
+        lrbp = &hba->lrb[tag];
+        if (lrbp->cmd) {
+            utag = blk_mq_unique_tag(scsi_cmd_to_rq(lrbp->cmd));
+            hwq_num = blk_mq_unique_tag_to_hwq(utag);
+            ufshcd_poll(hba->host, hwq_num);
+        }
+    }

Is my understanding correct that ufshcd_transfer_req_compl() is only called from single doorbell code paths and hence that the above change is not necessary?
ufshcd_transfer_req_compl() can be invoked from MCQ mode such as the ufshcd_err_handler() as below:
ufshcd_err_handler()-->ufshcd_complete_requests()-->ufshcd_transfer_req_compl()

Since there are multiple statements in ufshcd_transfer_req_compl() that assume
SDB mode (resetting SDB interrupt aggregation and calling ufshcd_poll()), please
move the is_mcq_enabled() test from ufshcd_transfer_req_compl() into the
callers of that function.

Thanks,

Bart.