> Il giorno 3 feb 2021, alle ore 12:43, Jan Kara <jack@xxxxxxx> ha scritto: > > On Thu 28-01-21 18:54:05, Paolo Valente wrote: >> >> >>> Il giorno 26 gen 2021, alle ore 17:18, Jens Axboe <axboe@xxxxxxxxx> ha scritto: >>> >>> On 1/26/21 3:50 AM, Paolo Valente wrote: >>>> Consider a new I/O request that arrives for a bfq_queue bfqq. If, when >>>> this happens, the only active bfq_queues are bfqq and either its waker >>>> bfq_queue or one of its woken bfq_queues, then there is no point in >>>> queueing this new I/O request in bfqq for service. In fact, the >>>> in-service queue and bfqq agree on serving this new I/O request as >>>> soon as possible. So this commit puts this new I/O request directly >>>> into the dispatch list. >>>> >>>> Tested-by: Jan Kara <jack@xxxxxxx> >>>> Signed-off-by: Paolo Valente <paolo.valente@xxxxxxxxxx> >>>> --- >>>> block/bfq-iosched.c | 17 ++++++++++++++++- >>>> 1 file changed, 16 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >>>> index a83149407336..e5b83910fbe0 100644 >>>> --- a/block/bfq-iosched.c >>>> +++ b/block/bfq-iosched.c >>>> @@ -5640,7 +5640,22 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, >>>> >>>> spin_lock_irq(&bfqd->lock); >>>> bfqq = bfq_init_rq(rq); >>>> - if (!bfqq || at_head || blk_rq_is_passthrough(rq)) { >>>> + >>>> + /* >>>> + * Additional case for putting rq directly into the dispatch >>>> + * queue: the only active bfq_queues are bfqq and either its >>>> + * waker bfq_queue or one of its woken bfq_queues. In this >>>> + * case, there is no point in queueing rq in bfqq for >>>> + * service. In fact, the in-service queue and bfqq agree on >>>> + * serving this new I/O request as soon as possible. >>>> + */ >>>> + if (!bfqq || >>>> + (bfqq != bfqd->in_service_queue && >>>> + bfqd->in_service_queue != NULL && >>>> + bfq_tot_busy_queues(bfqd) == 1 + bfq_bfqq_busy(bfqq) && >>>> + (bfqq->waker_bfqq == bfqd->in_service_queue || >>>> + bfqd->in_service_queue->waker_bfqq == bfqq)) || >>>> + at_head || blk_rq_is_passthrough(rq)) { >>>> if (at_head) >>>> list_add(&rq->queuelist, &bfqd->dispatch); >>>> else >>>> >>> >>> This is unreadable... Just seems like you are piling heuristics in to >>> catch some case, and it's neither readable nor clean. >>> >> >> Yeah, these comments inappropriately assume that the reader knows the >> waker mechanism in depth. And they do not stress at all how important >> this improvement is. >> >> I'll do my best to improve these comments. >> >> To try to do a better job, let me also explain the matter early here. >> Maybe you or others can give me some early feedback (or just tell me >> to proceed). >> >> This change is one of the main improvements that boosted >> throughput in Jan's tests. Here is the rationale: >> - consider a bfq_queue, say Q1, detected as a waker of another >> bfq_queue, say Q2 >> - by definition of a waker, Q1 blocks the I/O of Q2, i.e., some I/O of >> of Q1 needs to be completed for new I/O of Q1 to arrive. A notable > ^^ Q2? > Yes, thank you! (after this interaction, I'll fix and improve all this description, according to your comments) >> example is journald >> - so, Q1 and Q2 are in any respect two cooperating processes: if the >> service of Q1's I/O is delayed, Q2 can only suffer from it. >> Conversely, if Q2's I/O is delayed, the purpose of Q1 is just defeated. > > What do you exactly mean by this last sentence? By definition of waker, the purpose of Q1's I/O is doing what needs to be done, so that new Q2's I/O can finally be issued. Delaying Q2's I/O is the opposite of this goal. > >> - as a consequence if some I/O of Q1/Q2 arrives while Q2/Q1 is the >> only queue in service, there is absolutely no point in delaying the >> service of such an I/O. The only possible result is a throughput >> loss, detected by Jan's test > > If we are idling at that moment waiting for more IO from in service queue, > I agree. And I agree too, if the drive has no internal queueing, has no parallelism or pipeline, or is at least one order of magnitude slower than the CPU is processing I/O. In all other cases, serving the I/O of only one queue at a time means throwing away throughput. For example, on a consumer SSD, moving from one to two I/O threads served in parallel usually means doubling the throughput. So, the best thing to do, if all the above conditions are met, is to have this new I/O dispatched as soon as possible. The most efficient way to attain this goal is to just put the new I/O directly into the dispatch list. > But that doesn't seem to be part of your condition above? > >> - so, when the above condition holds, the most effective and efficient >> action is to put the new I/O directly in the dispatch list >> - as an additional restriction, Q1 and Q2 must be the only busy queues >> for this commit to put the I/O of Q2/Q1 in the dispatch list. This is >> necessary, because, if also other queues are waiting for service, then >> putting new I/O directly in the dispatch list may evidently cause a >> violation of service guarantees for the other queues > > This last restriction is not ideal for cases like jbd2 thread since it may > still lead to pointless idling but I understand that without some > restriction like this several waking threads could just starve other ones. Yeah, the goal here is to reduce a little bit false positives. > So I guess it's fine for now. > Yes, hopefully experience will lead us to even improvements or even better solutions. Thanks, Paolo > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR