Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation

Bart Van Assche <bvanassche@xxxxxxx> · Wed, 20 Nov 2019 09:58:43 -0800

On 11/19/19 7:38 PM, cang@xxxxxxxxxxxxxx wrote:
On 2019-11-20 07:16, Bart Van Assche wrote:
On 11/18/19 9:33 PM, cang@xxxxxxxxxxxxxx wrote:
[ ... ]

I am still not quite clear about the blk_freeze_queue() part.

In the new implementation of ufshcd_clock_scaling_prepare(), after 
blk_freeze_queue_start(), we call blk_mq_freeze_queue_wait_timeout()
to wait for 1 sec. In addition to those requests which have already
been queued to HW doorbell, more requests will be dispatched within 1 
sec, through the lowest Gear. My first impression is that it might be >>> a bit lazy and I am not sure how much it may affect the benchmarks.

First of all, reg above lines, do you agree that there can be latency in 
scaling up/down
comparing with the old implementation?

I can understand that after queue is frozen, all calls to blk_get_request()
are blocked, no submission can be made after this point, due to
blk_queue_enter() shall wait on q->mq_freeze_wq.

<--code snippet-->
wait_event(q->mq_freeze_wq,
            (atomic_read(&q->mq_freeze_depth) == 0 &&
<--code snippet-->

And if the request queues are heavily loaded(many bios are waiting 
for a free tag to become a request),
is 1 sec long enough to freeze all these request queue?

But here I meant those bios, before we call blk_freeze_queue_start(), 
sent through
submit_bio() calls which have already entered blk_mq_get_request() and 
already
returned from the blk_queue_enter_live(). These bios are waiting for a 
free tag
(waiting on func blk_mq_get_tag() when queue is full).
Shall the request queue be frozen before these bios are handled?

void blk_mq_freeze_queue_wait(struct request_queue *q)
{
      wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);

Per my understanding, these bios have already increased 
&q->q_usage_counter.
And the &q->q_usage_counter will only become 0 when all of the requests 
in the
queue and these bios(after they get free tags and turned into requests) are
completed from block layer, meaning after blk_queue_exit() is called in
__blk_mq_free_request() for all of them. Please correct me if I am wrong.

Hi Can,

Please have another look at the request queue freezing mechanism in the 
block layer. If blk_queue_enter() sleeps in wait_event() that implies 
either that the percpu_ref_tryget_live() call failed or that the 
percpu_ref_tryget_live() succeeded and that it was followed by a 
percpu_ref_put() call. Ignoring concurrent q_usage_counter changes, in 
both cases q->q_usage_counter will have the same value as before 
blk_queue_enter() started.

In other words, there shouldn't be a latency difference between the 
current and the proposed approach since neither approach waits for 
completion of bios or SCSI commands that are queued after clock scaling 
started.

Thanks,

Bart.