> Il giorno 5 feb 2021, alle ore 11:16, Paolo Valente <paolo.valente@xxxxxxxxxx> ha scritto: > > > >> Il giorno 3 feb 2021, alle ore 12:43, Jan Kara <jack@xxxxxxx> ha scritto: >> >> On Thu 28-01-21 18:54:05, Paolo Valente wrote: >>> >>> >>>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@xxxxxxxxx> ha scritto: >>>> >>>> On 1/26/21 3:50 AM, Paolo Valente wrote: >>>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when >>>>> this happens, the only active bfq_queues are bfqq and either its waker >>>>> bfq_queue or one of its woken bfq_queues, then there is no point in >>>>> queueing this new I/O request in bfqq for service. In fact, the >>>>> in-service queue and bfqq agree on serving this new I/O request as >>>>> soon as possible. So this commit puts this new I/O request directly >>>>> into the dispatch list. >>>>> >>>>> Tested-by: Jan Kara <jack@xxxxxxx> >>>>> Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx> >>>>> --- >>>>> block/bfq-iosched.c | 17 ++++++++++++++++- >>>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >>>>> index a83149407336..e5b83910fbe0 100644 >>>>> --- a/block/bfq-iosched.c >>>>> +++ b/block/bfq-iosched.c >>>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >>>>> >>>>> spin_lock_irq(&bfqd->lock); >>>>> bfqq = bfq_init_rq(rq); >>>>> - if (!bfqq || at_head || blk_rq_is_passthrough(rq)) { >>>>> + >>>>> + /* >>>>> + * Additional case for putting rq directly into the dispatch >>>>> + * queue: the only active bfq_queues are bfqq and either its >>>>> + * waker bfq_queue or one of its woken bfq_queues. In this >>>>> + * case, there is no point in queueing rq in bfqq for >>>>> + * service. In fact, the in-service queue and bfqq agree on >>>>> + * serving this new I/O request as soon as possible. >>>>> + */ >>>>> + if (!bfqq || >>>>> + (bfqq != bfqd->in_service_queue && >>>>> + bfqd->in_service_queue != NULL && >>>>> + bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) && >>>>> + (bfqq->waker_bfqq == bfqd->in_service_queue || >>>>> + bfqd->in_service_queue->waker_bfqq == bfqq)) || >>>>> + at_head || blk_rq_is_passthrough(rq)) { >>>>> if (at_head) >>>>> list_add(&rq->queuelist, &bfqd->dispatch); >>>>> else >>>>> >>>> >>>> This is unreadable... Just seems like you are piling heuristics in to >>>> catch some case, and it's neither readable nor clean. >>>> >>> >>> Yeah, these comments inappropriately assume that the reader knows the >>> waker mechanism in depth. And they do not stress at all how important >>> this improvement is. >>> >>> I'll do my best to improve these comments. >>> >>> To try to do a better job, let me also explain the matter early here. >>> Maybe you or others can give me some early feedback (or just tell me >>> to proceed). >>> >>> This change is one of the main improvements that boosted >>> throughput in Jan's tests. Here is the rationale: >>> - consider a bfq_queue, say Q1, detected as a waker of another >>> bfq_queue, say Q2 >>> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of >>> of Q1 needs to be completed for new I/O of Q1 to arrive. A notable >> ^^ Q2? >> > > Yes, thank you! > > (after this interaction, I'll fix and improve all this description, > according to your comments) > >>> example is journald >>> - so, Q1 and Q2 are in any respect two cooperating processes: if the >>> service of Q1's I/O is delayed, Q2 can only suffer from it. >>> Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated. >> >> What do you exactly mean by this last sentence? > > By definition of waker, the purpose of Q1's I/O is doing what needs to > be done, so that new Q2's I/O can finally be issued. Delaying Q2's I/O > is the opposite of this goal. > >> >>> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the >>> only queue in service, there is absolutely no point in delaying the >>> service of such an I/O. The only possible result is a throughput >>> loss, detected by Jan's test >> >> If we are idling at that moment waiting for more IO from in service queue, >> I agree. > > And I agree too, if the drive has no internal queueing, has no > parallelism or pipeline, or is at least one order of magnitude slower > than the CPU is processing I/O. In all other cases, serving the I/O > of only one queue at a time means throwing away throughput. For > example, on a consumer SSD, moving from one to two I/O threads served > in parallel usually means doubling the throughput. > > So, the best thing to do, if all the above conditions are met, is to > have this new I/O dispatched as soon as possible. > > The most efficient way to attain this goal is to just put the new I/O > directly into the dispatch list. > >> But that doesn't seem to be part of your condition above? >> >>> - so, when the above condition holds, the most effective and efficient >>> action is to put the new I/O directly in the dispatch list >>> - as an additional restriction, Q1 and Q2 must be the only busy queues >>> for this commit to put the I/O of Q2/Q1 in the dispatch list. This is >>> necessary, because, if also other queues are waiting for service, then >>> putting new I/O directly in the dispatch list may evidently cause a >>> violation of service guarantees for the other queues >> >> This last restriction is not ideal for cases like jbd2 thread since it may >> still lead to pointless idling but I understand that without some >> restriction like this several waking threads could just starve other ones. > > Yeah, the goal here is to reduce a little bit false positives. > >> So I guess it's fine for now. >> > > Yes, hopefully experience will lead us to even improvements or even > better solutions. > Hi Jens, on a separate thread, Jan told me that my last reply, and therefore also this patch are ok for him. May I now proceed with a V2, in which I'll report my extra comments? Or are there some other issues for you? Thanks, Paolo > Thanks, > Paolo > >> Honza >> -- >> Jan Kara <jack@xxxxxxxx> >> SUSE Labs, CR