Re: [PATCH v4 1/8] scsi: ufs: Flush exception event before suspend

Can Guo <cang@xxxxxxxxxxxxxx> · Mon, 03 Feb 2020 14:23:15 +0800

On 2020-01-26 11:29, Bart Van Assche wrote:
On 2020-01-22 23:25, Can Guo wrote:

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 1201578..c2de29f 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4760,8 +4760,15 @@ static void ufshcd_slave_destroy(struct 
scsi_device *sdev)
 			 * UFS device needs urgent BKOPs.
 			 */
 			if (!hba->pm_op_in_progress &&
-			    ufshcd_is_exception_event(lrbp->ucd_rsp_ptr))
-				schedule_work(&hba->eeh_work);
+			    ufshcd_is_exception_event(lrbp->ucd_rsp_ptr)) {
+				/*
+				 * Prevent suspend once eeh_work is scheduled
+				 * to avoid deadlock between ufshcd_suspend
+				 * and exception event handler.
+				 */
+				if (schedule_work(&hba->eeh_work))
+					pm_runtime_get_noresume(hba->dev);
+			}

Please combine the two logical tests with "&&" instead of nesting two
if-statements inside each other.

 			break;
 		case UPIU_TRANSACTION_REJECT_UPIU:
 			/* TODO: handle Reject UPIU Response */
@@ -5215,7 +5222,14 @@ static void 
ufshcd_exception_event_handler(struct work_struct *work)

 out:
 	scsi_unblock_requests(hba->host);
-	pm_runtime_put_sync(hba->dev);
+	/*
+	 * pm_runtime_get_noresume is called while scheduling
+	 * eeh_work to avoid suspend racing with exception work.
+	 * Hence decrement usage counter using pm_runtime_put_noidle
+	 * to allow suspend on completion of exception event handler.
+	 */
+	pm_runtime_put_noidle(hba->dev);
+	pm_runtime_put(hba->dev);
 	return;
 }

@@ -7901,6 +7915,7 @@ static int ufshcd_suspend(struct ufs_hba *hba, 
enum ufs_pm_op pm_op)
 			goto enable_gating;
 	}

+	flush_work(&hba->eeh_work);
 	ret = ufshcd_link_state_transition(hba, req_link_state, 1);
 	if (ret)
 		goto set_dev_active;

I think this patch introduces a new race condition, namely the 
following:
- ufshcd_slave_destroy() tests pm_op_in_progress and reads the value
  zero from that variable.
- ufshcd_suspend() sets hba->pm_op_in_progress to one.
- ufshcd_slave_destroy() calls schedule_work().

How about fixing this race condition by calling
pm_runtime_get_noresume() before checking pm_op_in_progress and by
reallowing resume if no work is scheduled?

Thanks,

Bart.

Hi Bart,

If you apply this patch, you will find the change is not in
ufshcd_slave_destroy(), but in ufshcd_transfer_rsp_status().
So the racing you mentioned above does not exist.

Thanks,

Can Guo.