Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation

cang@xxxxxxxxxxxxxx · Tue, 19 Nov 2019 05:33:17 +0000

On 2019-11-19 02:13, Bart Van Assche wrote:
On 11/14/19 10:01 PM, Can Guo wrote:
On 2019-11-15 00:11, Bart Van Assche wrote:
On 11/13/19 8:03 AM, Bean Huo (beanhuo) wrote:
I think, you are asking for comment from Can.  As for me, attached 
patch is better.
Removing ufshcd_wait_for_doorbell_clr(), instead of reading doorbell 
register, Now
using block layer blk_mq_{un,}freeze_queue(), looks good. I tested 
your V5 patches,
didn't find problem yet.

Since my available platform doesn't support dynamic clk scaling, I 
think, now seems only
Qcom UFS controllers support this feature. So we need Can Guo to 
finally confirm it.

Do you agree with this patch series if patch 4 is replaced by the
patch attached to my previous e-mail? The entire (revised) series is
available at https://github.com/bvanassche/linux/tree/ufs-for-next.

After ufshcd_clock_scaling_prepare() returns(no error), all request 
queues are frozen. If failure
happens(say power mode change command fails) after this point and 
error handler kicks off,
we need to send dev commands(in ufshcd_probe_hba()) to bring UFS back 
to functionality.
However, as the hba->cmd_queue is frozen, dev commands cannot be sent, 
the error handler shall fail.

Hi Can,

My understanding of the current UFS driver code is that
ufshcd_clock_scaling_prepare() waits for ongoing commands to finish
but not for SCSI error handling to finish. Would you agree with
changing that behavior such that ufshcd_clock_scaling_prepare()
returns an error
code if SCSI error handling is in progress? Do you agree that once
that change has been made that it is fine to invoke
blk_freeze_queue_start() for all three types of block layer request
queues (SCSI commands, device management commands and TMFs)? Do you
agree that this would fix the issue that it is possible today to
submit TMFs from user space using through the BSG queue while clock
scaling is in progress?

Thanks,

Bart.

Hi Bart,

Your understanding is correct.

Would you agree with changing that behavior such that 
ufshcd_clock_scaling_prepare()
returns an error code if SCSI error handling is in progress?

I believe we already have the check of ufshcd_eh_in_progress() in
ufshcd_devfreq_target() before call ufshcd_devfreq_scale().

Do you agree that once that change has been made that it is fine to 
invoke
blk_freeze_queue_start() for all three types of block layer request
queues (SCSI commands, device management commands and TMFs)?

I agree. As err handling work always runs in ASYNC way in current code 
base, it is fine to freeze all the queues.
If there is err handling work, scheduled during scaling, would 
eventually be unblocked when hba->cmd_queue
is unfrozen after scaling returns or fails.

Sorry for making you confused. My previous comment is based on that if 
we need to do hibern8
enter/exit in vendor ops (see 
https://lore.kernel.org/patchwork/patch/1143579/), and if hibern8
enter/exit fails during scaling, we need to call ufshcd_link_recovery() 
to recovery everything in SYNC way.
And as hba->cmd_queue is frozen, ufshcd_link_recovery() would hang when 
it needs to send device
manangement commands. In this case, we need to make sure hba->cmd_queue 
is NOT frozen.
So it seems contraditory.

FYI, even with the patch 
https://lore.kernel.org/patchwork/patch/1143579/, current
implementation also has this problem. And we have another change to 
address it by
making change to ufshcd_exec_dev_cmd() like below.

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c2a2ff2..81a000e 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -4068,6 +4068,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,
        int tag;
        struct completion wait;
        unsigned long flags;
+       bool has_read_lock = false;

@@ -4075,8 +4076,10 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,

+       if (!ufshcd_eh_in_progress(hba)) {
                down_read(&hba->lock);
+               has_read_lock = true;
+       }

        /*
         * Get free slot, sleep if slots are unavailable.
@@ -4110,7 +4113,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba 
*hba,
 out_put_tag:
        ufshcd_put_dev_cmd_tag(hba, tag);
        wake_up(&hba->dev_cmd.tag_wq);
+       if (has_read_lock)
                up_read(&hba->lock);
        return err;
 }


Aside the functionality, we need to take this change carefully as it may 
relate with performance.
With the old implementation, when devfreq decides to scale up, we can 
make sure that all requests
sent to UFS device after this point will go through the highest UFS 
Gear. Scaling up will happen
right after doorbell is cleared, not even need to wait till the requests 
are freed from block layer.
And 1 sec is long enough to clean out the doorbell by HW.

In the new implementation of ufshcd_clock_scaling_prepare(), after 
blk_freeze_queue_start(), we call
blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to 
those requests which have already
been queued to HW doorbell, more requests will be dispatched within 1 
sec, through the lowest Gear.
My first impression is that it might be a bit lazy and I am not sure how 
much it may affect the benchmarks.

And if the request queues are heavily loaded(many bios are waiting for a 
free tag to become a request),
is 1 sec long enough to freeze all these request queues? If no, 
performance would be affected if scaling up
fails frequently.
As per my understanding, the request queue will become frozen only after 
all the existing requests and
bios(which are already waiting for a free tag on the queue) to be 
completed and freed from block layer.
Please correct me if I am wrong.

While, in the old implementatoin, when devfreq decides to scale up, 
scaling can always
happen for majority chances, execept for those unexpected HW issues.

Best Regards,
Can Guo