Re: [PATCH BUGFIX/IMPROVEMENT 2/6] block, bfq: put reqs of waker and woken in dispatch list

Paolo Valente <paolo.valente@xxxxxxxxxx> · Tue, 9 Feb 2021 18:09:13 +0100

> Il giorno 5 feb 2021, alle ore 11:16, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto:
> 
> 
> 
>> Il giorno 3 feb 2021, alle ore 12:43, Jan Kara <jack@xxxxxxx> ha scritto:
>> 
>> On Thu 28-01-21 18:54:05, Paolo Valente wrote:
>>> 
>>> 
>>>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@xxxxxxxxx> ha scritto:
>>>> 
>>>> On 1/26/21 3:50 AM, Paolo Valente wrote:
>>>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when
>>>>> this happens, the only active bfq_queues are bfqq and either its waker
>>>>> bfq_queue or one of its woken bfq_queues, then there is no point in
>>>>> queueing this new I/O request in bfqq for service. In fact, the
>>>>> in-service queue and bfqq agree on serving this new I/O request as
>>>>> soon as possible. So this commit puts this new I/O request directly
>>>>> into the dispatch list.
>>>>> 
>>>>> Tested-by: Jan Kara <jack@xxxxxxx>
>>>>> Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx>
>>>>> ---
>>>>> block/bfq-iosched.c | 17 ++++++++++++++++-
>>>>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>>>> 
>>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>>>> index a83149407336..e5b83910fbe0 100644
>>>>> --- a/block/bfq-iosched.c
>>>>> +++ b/block/bfq-iosched.c
>>>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>>> 
>>>>> 	spin_lock_irq(&bfqd->lock);
>>>>> 	bfqq = bfq_init_rq(rq);
>>>>> -	if (!bfqq || at_head || blk_rq_is_passthrough(rq)) {
>>>>> +
>>>>> +	/*
>>>>> +	 * Additional case for putting rq directly into the dispatch
>>>>> +	 * queue: the only active bfq_queues are bfqq and either its
>>>>> +	 * waker bfq_queue or one of its woken bfq_queues. In this
>>>>> +	 * case, there is no point in queueing rq in bfqq for
>>>>> +	 * service. In fact, the in-service queue and bfqq agree on
>>>>> +	 * serving this new I/O request as soon as possible.
>>>>> +	 */
>>>>> +	if (!bfqq ||
>>>>> +	    (bfqq != bfqd->in_service_queue &&
>>>>> +	     bfqd->in_service_queue != NULL &&
>>>>> +	     bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) &&
>>>>> +	     (bfqq->waker_bfqq == bfqd->in_service_queue ||
>>>>> +	      bfqd->in_service_queue->waker_bfqq == bfqq)) ||
>>>>> +	    at_head || blk_rq_is_passthrough(rq)) {
>>>>> 		if (at_head)
>>>>> 			list_add(&rq->queuelist, &bfqd->dispatch);
>>>>> 		else
>>>>> 
>>>> 
>>>> This is unreadable... Just seems like you are piling heuristics in to
>>>> catch some case, and it's neither readable nor clean.
>>>> 
>>> 
>>> Yeah, these comments inappropriately assume that the reader knows the
>>> waker mechanism in depth.  And they do not stress at all how important
>>> this improvement is.
>>> 
>>> I'll do my best to improve these comments.
>>> 
>>> To try to do a better job, let me also explain the matter early here.
>>> Maybe you or others can give me some early feedback (or just tell me
>>> to proceed).
>>> 
>>> This change is one of the main improvements that boosted
>>> throughput in Jan's tests.  Here is the rationale:
>>> - consider a bfq_queue, say Q1, detected as a waker of another
>>> bfq_queue, say Q2
>>> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of
>>> of Q1 needs to be completed for new I/O of Q1 to arrive.  A notable
>> 					       ^^ Q2?
>> 
> 
> Yes, thank you!
> 
> (after this interaction, I'll fix and improve all this description,
> according to your comments)
> 
>>> example is journald
>>> - so, Q1 and Q2 are in any respect two cooperating processes: if the
>>> service of Q1's I/O is delayed, Q2 can only suffer from it.
>>> Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated.
>> 
>> What do you exactly mean by this last sentence?
> 
> By definition of waker, the purpose of Q1's I/O is doing what needs to
> be done, so that new Q2's I/O can finally be issued.  Delaying Q2's I/O
> is the opposite of this goal.
> 
>> 
>>> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the
>>> only queue in service, there is absolutely no point in delaying the
>>> service of such an I/O.  The only possible result is a throughput
>>> loss, detected by Jan's test
>> 
>> If we are idling at that moment waiting for more IO from in service queue,
>> I agree.
> 
> And I agree too, if the drive has no internal queueing, has no
> parallelism or pipeline, or is at least one order of magnitude slower
> than the CPU is processing I/O.  In all other cases, serving the I/O
> of only one queue at a time means throwing away throughput.  For
> example, on a consumer SSD, moving from one to two I/O threads served
> in parallel usually means doubling the throughput.
> 
> So, the best thing to do, if all the above conditions are met, is to
> have this new I/O dispatched as soon as possible.
> 
> The most efficient way to attain this goal is to just put the new I/O
> directly into the dispatch list.
> 
>> But that doesn't seem to be part of your condition above?
>> 
>>> - so, when the above condition holds, the most effective and efficient
>>> action is to put the new I/O directly in the dispatch list
>>> - as an additional restriction, Q1 and Q2 must be the only busy queues
>>> for this commit to put the I/O of Q2/Q1 in the dispatch list.  This is
>>> necessary, because, if also other queues are waiting for service, then
>>> putting new I/O directly in the dispatch list may evidently cause a
>>> violation of service guarantees for the other queues
>> 
>> This last restriction is not ideal for cases like jbd2 thread since it may
>> still lead to pointless idling but I understand that without some
>> restriction like this several waking threads could just starve other ones.
> 
> Yeah, the goal here is to reduce a little bit false positives.
> 
>> So I guess it's fine for now.
>> 
> 
> Yes, hopefully experience will lead us to even improvements or even
> better solutions.
> 

Hi Jens,
on a separate thread, Jan told me that my last reply, and therefore
also this patch are ok for him.  May I now proceed with a V2, in which
I'll report my extra comments? Or are there some other issues for you?

Thanks,
Paolo

> Thanks,
> Paolo
> 
>> 								Honza
>> -- 
>> Jan Kara <jack@xxxxxxxx>
>> SUSE Labs, CR