Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019-11-20 07:16, Bart Van Assche wrote:
On 11/18/19 9:33 PM, cang@xxxxxxxxxxxxxx wrote:
Aside the functionality, we need to take this change carefully as it may relate with performance. With the old implementation, when devfreq decides to scale up, we can make sure that all requests sent to UFS device after this point will go through the highest UFS Gear. Scaling up will happen right after doorbell is cleared, not even need to wait till the requests are freed from block layer.
And 1 sec is long enough to clean out the doorbell by HW.

In the new implementation of ufshcd_clock_scaling_prepare(), after blk_freeze_queue_start(), we call blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to those requests which have already been queued to HW doorbell, more requests will be dispatched within 1 sec, through the lowest Gear. My first impression is that it might be a bit lazy and I am not sure how much it may affect the benchmarks.

And if the request queues are heavily loaded(many bios are waiting for a free tag to become a request), is 1 sec long enough to freeze all these request queues? If no, performance would be affected if scaling up
fails frequently.
As per my understanding, the request queue will become frozen only after all the existing requests and bios(which are already waiting for a free tag on the queue) to be completed and freed from block layer.
Please correct me if I am wrong.

While, in the old implementation, when devfreq decides to scale up, scaling can always
happen for majority chances, except for those unexpected HW issues.

Hi Can,

I agree with the description above of the current behavior of the UFS
driver. But I do not agree with the interpretation of the behavior of
the changes I proposed. The implementation in the block layer of
request queue freezing is as follows:

void blk_freeze_queue_start(struct request_queue *q)
{
	mutex_lock(&q->mq_freeze_lock);
	if (++q->mq_freeze_depth == 1) {
		percpu_ref_kill(&q->q_usage_counter);
		mutex_unlock(&q->mq_freeze_lock);
		if (queue_is_mq(q))
			blk_mq_run_hw_queues(q, false);
	} else {
		mutex_unlock(&q->mq_freeze_lock);
	}
}

As long as calls from other threads to
blk_mq_freeze_queue()/blk_mq_unfreeze_queue() are balanced, after
blk_freeze_queue_start() returns it is guaranteed that
q->q_usage_counter is in __PERCPU_REF_DEAD mode. Calls to
blk_get_request() etc. block in that mode until the request queue is
unfrozen. See also blk_queue_enter().

In other words, calling blk_freeze_queue() from inside
ufshcd_clock_scaling_prepare() should preserve the current behavior,
namely that ufshcd_clock_scaling_prepare() waits for ongoing requests
to finish and also that submission of new requests is postponed until
clock scaling has finished. Do you agree with this analysis?

Thanks,

Bart.

Hi Bart,

I am still not quite clear about the blk_freeze_queue() part.

In the new implementation of ufshcd_clock_scaling_prepare(), after blk_freeze_queue_start(), we call blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to those requests which have already been queued to HW doorbell, more requests will be dispatched within 1 sec, through the lowest Gear. My first impression is that it might be a bit lazy and I am not sure how much it may affect the benchmarks.

First of all, reg above lines, do you agree that there can be latency in scaling up/down
comparing with the old implementation?

I can understand that after queue is frozen, all calls to blk_get_request()
are blocked, no submission can be made after this point, due to
blk_queue_enter() shall wait on q->mq_freeze_wq.

<--code snippet-->
wait_event(q->mq_freeze_wq,
           (atomic_read(&q->mq_freeze_depth) == 0 &&
<--code snippet-->

And if the request queues are heavily loaded(many bios are waiting for a free tag to become a request),
is 1 sec long enough to freeze all these request queue?

But here I meant those bios, before we call blk_freeze_queue_start(), sent through submit_bio() calls which have already entered blk_mq_get_request() and already returned from the blk_queue_enter_live(). These bios are waiting for a free tag
(waiting on func blk_mq_get_tag() when queue is full).
Shall the request queue be frozen before these bios are handled?

void blk_mq_freeze_queue_wait(struct request_queue *q)
{
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
}
EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);

Per my understanding, these bios have already increased &q->q_usage_counter. And the &q->q_usage_counter will only become 0 when all of the requests in the queue and these bios(after they get free tags and turned into requests) are
completed from block layer, meaning after blk_queue_exit() is called in
__blk_mq_free_request() for all of them. Please correct me if I am wrong.

Thanks,

Can Guo.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux