Re: [PATCH v6 4/4] ufs: Simplify the clock scaling mechanism implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-11-26 09:05, Bart Van Assche wrote:
On 11/24/19 11:38 PM, cang@xxxxxxxxxxxxxx wrote:
Sorry to bring back the old discussion we had on V5 of this series here. I spent some time understanding the mq freeze queue implementation but I still have the same concern regards the your new implementation of clock
scaling prepare/unprepare.

I attached two logs I collected on my setups(the kernel is 5.4).

In log1.txt, I pretty much ported your changes to our code base and copy
a large file to partition /dev/sde1(mounted to /mnt). You can see the
debug logs I added show that the -ETIMEDOUT returned from ufshcd_devfreq_scale(),
although scaling UP/DOWN succeed some times.

To make it more controllable, I made a new sysfs entry, namely freeze_sde_queue, so that I can freeze the queue of /dev/sde whenever I do "echo 1 > freeze_sde_queue". After call blk_freeze_queue_start(), call blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In the end, blk_mq_unfreeze_queue() would be called no matter timeout happened or not.

I also added a flag, called enable_print, to request_queue, and set it after I call blk_freeze_queue_start(). Block layer, when this flag is set, will print logs from
blk_mq_start_request() and blk_mq_get_request().

I tried multiple times during copy large a file to partition /dev/sde1(mounted to /mnt).

In log2.txt, there are still wait timeout, and you can see after blk_freeze_queue_start() is called for request queue of /dev/sde, there still can be requests sent down to UFS driver and these requests are the "latency" that I mentioned in our previous discussion. I know these requests are not "new" requests reach block layer after freeze starts,  but the ones in the queue before freeze starts. However, they are the difference from the old implementation. When scaling up should happen, these requests are still sent through the lowest HS Gear, while when scaling down should happen, these requests are sent through the highest HS Gear. The former may cause performance issue while the later may cause power penalty, this is our
major concern.

Hi Can,

Thank you for having taken the time to run these tests and to share the output of these tests. If I understood you correctly the patch I posted works but introduces a performance regression, namely that it takes longer to suspend request processing for an UFS host. How about replacing patch 4/4 with the
patch below?

Thanks,

Bart.

Subject: [PATCH] ufs: Make clock scaling more robust

Scaling the clock is only safe while no commands are in progress. The
current clock scaling implementation uses hba->clk_scaling_lock to
serialize clock scaling against the following three functions:
* ufshcd_queuecommand()          (uses sdev->request_queue)
* ufshcd_exec_dev_cmd()          (uses hba->cmd_queue)
* ufshcd_issue_devman_upiu_cmd() (uses hba->cmd_queue)

__ufshcd_issue_tm_cmd(), which uses hba->tmf_queue, is not yet serialized against clock scaling. Disallowing that TMFs are submitted as follows from
user space while clock scaling is in progress:

submit UPIU_TRANSACTION_TASK_REQ on an UFS BSG queue
-> ufs_bsg_request()
  -> ufshcd_exec_raw_upiu_cmd(msgcode = UPIU_TRANSACTION_TASK_REQ)
    -> ufshcd_exec_raw_upiu_cmd()
      -> __ufshcd_issue_tm_cmd()

Cc: Can Guo <cang@xxxxxxxxxxxxxx>
Cc: Bean Huo <beanhuo@xxxxxxxxxx>
Cc: Avri Altman <avri.altman@xxxxxxx>
Cc: Stanley Chu <stanley.chu@xxxxxxxxxxxx>
Cc: Tomas Winkler <tomas.winkler@xxxxxxxxx>
Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
---
 drivers/scsi/ufs/ufshcd.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 99337e0b54f6..631c588f279a 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -972,14 +972,13 @@ static bool
ufshcd_is_devfreq_scaling_required(struct ufs_hba *hba,
 }

 static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
-					u64 wait_timeout_us)
+					unsigned long deadline)
 {
 	unsigned long flags;
 	int ret = 0;
 	u32 tm_doorbell;
 	u32 tr_doorbell;
 	bool timeout = false, do_last_check = false;
-	ktime_t start;

 	ufshcd_hold(hba, false);
 	spin_lock_irqsave(hba->host->host_lock, flags);
@@ -987,7 +986,6 @@ static int ufshcd_wait_for_doorbell_clr(struct ufs_hba *hba,
 	 * Wait for all the outstanding tasks/transfer requests.
 	 * Verify by checking the doorbell registers are clear.
 	 */
-	start = ktime_get();
 	do {
 		if (hba->ufshcd_state != UFSHCD_STATE_OPERATIONAL) {
 			ret = -EBUSY;
@@ -1005,8 +1003,7 @@ static int ufshcd_wait_for_doorbell_clr(struct
ufs_hba *hba,

 		spin_unlock_irqrestore(hba->host->host_lock, flags);
 		schedule();
-		if (ktime_to_us(ktime_sub(ktime_get(), start)) >
-		    wait_timeout_us) {
+		if (time_after(jiffies, deadline)) {
 			timeout = true;
 			/*
 			 * We might have scheduled out for long time so make
@@ -1079,26 +1076,43 @@ static int ufshcd_scale_gear(struct ufs_hba
*hba, bool scale_up)

 static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba)
 {
-	#define DOORBELL_CLR_TOUT_US		(1000 * 1000) /* 1 sec */
+	unsigned long deadline = jiffies + HZ /* 1 sec */;
 	int ret = 0;
 	/*
 	 * make sure that there are no outstanding requests when
 	 * clock scaling is in progress
 	 */
 	ufshcd_scsi_block_requests(hba);
+	blk_freeze_queue_start(hba->cmd_queue);
+	blk_freeze_queue_start(hba->tmf_queue);
+	if (blk_mq_freeze_queue_wait_timeout(hba->cmd_queue,
+				max_t(long, 1, deadline - jiffies)) <= 0)
+		goto unblock;
+	if (blk_mq_freeze_queue_wait_timeout(hba->tmf_queue,
+				max_t(long, 1, deadline - jiffies)) <= 0)
+		goto unblock;
 	down_write(&hba->clk_scaling_lock);

Hi Bart,

It looks better, but there is a race condition here. Consider below sequence, thread A takes the clk_scaling_lock and waiting on blk_get_request(), while
thread B has frozen the queue and waiting for the lock.

Thread A
    Call ufshcd_exec_dev_cmd() or ufshcd_issue_devman_upiu_cmd()
    [a] down_write(hba->clk_scaling_lock)
    [d] blk_get_request()

Thread B
    Call ufshcd_clock_scaling_prepare()
    [b] blk_freeze_queue_start(hba->cmd_queue)
    [c] blk_mq_freeze_queue_wait_timeout(hba->cmd_queue) returns > 0
    [e] down_write(hba->clk_scaling_lock)

BTW, I see no needs to freeze the hba->cmd_queue in scaling_prepare.
I went through our previous discussions and you mentioned freezing hba->cmd_queue
can serialize scaling and err handler.
However, it is not necessary and not 100% true. We've already have ufshcd_eh_in_progress()
check in ufshcd_devfreq_target() before call ufshcd_devfreq_scale().
If you think this is not enough(err handler may kick off after this check), having hba->cmd_queue frozen in scaling_prepare() does not mean the err handler is finished either, because device management commands are only used in certain steps during err handler.
Actually, with the original design, we don't see any problems caused by
concurrency of scaling and err handler, and if the concurrency really happens,
scaling would just fail.

Thanks,

Can Guo.

-	if (ufshcd_wait_for_doorbell_clr(hba, DOORBELL_CLR_TOUT_US)) {
-		ret = -EBUSY;
-		up_write(&hba->clk_scaling_lock);
-		ufshcd_scsi_unblock_requests(hba);
-	}
+	if (ufshcd_wait_for_doorbell_clr(hba, deadline))
+		goto up_write;

+out:
 	return ret;
+
+up_write:
+	up_write(&hba->clk_scaling_lock);
+unblock:
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	blk_mq_unfreeze_queue(hba->cmd_queue);
+	ufshcd_scsi_unblock_requests(hba);
+	ret = -EBUSY;
+	goto out;
 }

 static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba)
 {
 	up_write(&hba->clk_scaling_lock);
+	blk_mq_unfreeze_queue(hba->tmf_queue);
+	blk_mq_unfreeze_queue(hba->cmd_queue);
 	ufshcd_scsi_unblock_requests(hba);
 }



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux