> Il giorno 5 apr 2020, alle ore 18:16, Doug Anderson <dianders@xxxxxxxxxxxx> ha scritto: > > Hi, > > On Sun, Apr 5, 2020 at 7:55 AM Paolo Valente <paolo.valente@xxxxxxxxxx> wrote: >> >>> Il giorno 5 apr 2020, alle ore 16:00, Doug Anderson <dianders@xxxxxxxxxxxx> ha scritto: >>> >>> Hi, >>> >>> On Sun, Apr 5, 2020 at 2:15 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: >>>> >>>> OK, looks it isn't specific on BFQ any more. >>>> >>>> Follows another candidate approach for this issue, given it is so hard >>>> to trigger, we can make it more reliable by rerun queue when has_work() >>>> returns true after ops->dispath_request() returns NULL. >>>> >>>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c >>>> index 74cedea56034..4408e5d4fcd8 100644 >>>> --- a/block/blk-mq-sched.c >>>> +++ b/block/blk-mq-sched.c >>>> @@ -80,6 +80,7 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx) >>>> blk_mq_run_hw_queue(hctx, true); >>>> } >>>> >>>> +#define BLK_MQ_BUDGET_DELAY 3 /* ms units */ >>>> /* >>>> * Only SCSI implements .get_budget and .put_budget, and SCSI restarts >>>> * its queue by itself in its completion handler, so we don't need to >>>> @@ -103,6 +104,9 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) >>>> rq = e->type->ops.dispatch_request(hctx); >>>> if (!rq) { >>>> blk_mq_put_dispatch_budget(hctx); >>>> + >>>> + if (e->type->ops.has_work && e->type->ops.has_work(hctx)) >>>> + blk_mq_delay_run_hw_queue(hctx, BLK_MQ_BUDGET_DELAY); >>> >>> I agree that your patch should solve the race. With the current BFQ's >>> has_work() it's a bit of a disaster though. It will essentially put >>> blk-mq into a busy-wait loop (with a 3 ms delay between each poll) >>> while BFQ's has_work() says "true" but BFQ doesn't dispatch anything. >>> >>> ...so I guess the question that still needs to be answered: does >>> has_work() need to be exact? If so then we need the patch you propose >>> plus one to BFQ. If not, we should continue along the lines of my >>> patch. >>> >> >> Some more comments. BFQ's I/O plugging lasts 9 ms by default. So, >> with this last Ming's patch, BFQ may happen to be polled every 3ms, >> for at most three times. > > Ah! I did not know this. OK, then Ming's patch seems like it should > work. If nothing else it should fix the problem. If this ends up > making BFQ chew up too much CPU time then presumably someone will > notice and BFQ's has_work() can be improved. > > Ming: how do you want to proceed? Do you want to formally post the > patch? Do you want me to post a v3 of my series where I place patch > #2 with your patch? Do you want authorship (which implies adding your > Signed-off-by)? > > >> On the opposite end, making bfq_has_work plugging aware costs more >> complexity, and possibly one more lock. While avoiding the above >> occasional polling, this may imply a lot of overhead or CPU stalls on >> every dispatch. > > I still think it would be interesting to run performance tests with my > proof-of-concept solution for has_work(). Even if it's not ideal, > knowing whether performance increased, decreased, or stayed the same > would give information about how much more effort should be put into > this. > Why not? It is however hard to hope that we add only negligible overhead and CPU stalls if we move from one lock-protected section per I/O-request dispatch, to two or more lock-protected sections per request (has_work may be invoked several times per request). At any rate, if useful, one of the scripts in my S benchmark suite can also measure max IOPS (when limited only by I/O processing) [1]. The script is for Linux distros; I don't know whether it works in your environments of interest, Doug. Paolo [1] https://github.com/Algodev-github/S/blob/master/throughput-sync/throughput-sync.sh > -Doug