Re: [EXT] Re: [PATCH v5 4/4] ufs: Simplify the clock scaling mechanism implementation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/18/19 9:33 PM, cang@xxxxxxxxxxxxxx wrote:
Aside the functionality, we need to take this change carefully as it may relate with performance. With the old implementation, when devfreq decides to scale up, we can make sure that all requests sent to UFS device after this point will go through the highest UFS Gear. Scaling up will happen right after doorbell is cleared, not even need to wait till the requests are freed from block layer.
And 1 sec is long enough to clean out the doorbell by HW.

In the new implementation of ufshcd_clock_scaling_prepare(), after blk_freeze_queue_start(), we call blk_mq_freeze_queue_wait_timeout() to wait for 1 sec. In addition to those requests which have already been queued to HW doorbell, more requests will be dispatched within 1 sec, through the lowest Gear. My first impression is that it might be a bit lazy and I am not sure how much it may affect the benchmarks.

And if the request queues are heavily loaded(many bios are waiting for a free tag to become a request), is 1 sec long enough to freeze all these request queues? If no, performance would be affected if scaling up
fails frequently.
As per my understanding, the request queue will become frozen only after all the existing requests and bios(which are already waiting for a free tag on the queue) to be completed and freed from block layer.
Please correct me if I am wrong.

While, in the old implementation, when devfreq decides to scale up, scaling can always
happen for majority chances, except for those unexpected HW issues.

Hi Can,

I agree with the description above of the current behavior of the UFS driver. But I do not agree with the interpretation of the behavior of the changes I proposed. The implementation in the block layer of request queue freezing is as follows:

void blk_freeze_queue_start(struct request_queue *q)
{
	mutex_lock(&q->mq_freeze_lock);
	if (++q->mq_freeze_depth == 1) {
		percpu_ref_kill(&q->q_usage_counter);
		mutex_unlock(&q->mq_freeze_lock);
		if (queue_is_mq(q))
			blk_mq_run_hw_queues(q, false);
	} else {
		mutex_unlock(&q->mq_freeze_lock);
	}
}

As long as calls from other threads to blk_mq_freeze_queue()/blk_mq_unfreeze_queue() are balanced, after blk_freeze_queue_start() returns it is guaranteed that q->q_usage_counter is in __PERCPU_REF_DEAD mode. Calls to blk_get_request() etc. block in that mode until the request queue is unfrozen. See also blk_queue_enter().

In other words, calling blk_freeze_queue() from inside ufshcd_clock_scaling_prepare() should preserve the current behavior, namely that ufshcd_clock_scaling_prepare() waits for ongoing requests to finish and also that submission of new requests is postponed until clock scaling has finished. Do you agree with this analysis?

Thanks,

Bart.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux