Re: [PATCH] blk-mq: update hctx->dispatch_busy in case of real scheduler

Jan Kara <jack@xxxxxxx> · Mon, 31 May 2021 14:04:34 +0200

On Mon 31-05-21 09:37:01, Ming Lei wrote:
> On Fri, May 28, 2021 at 02:26:31PM +0200, Jan Kara wrote:
> > On Fri 28-05-21 11:20:55, Ming Lei wrote:
> > > Commit 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
> > > starts to support io batching submission by using hctx->dispatch_busy.
> > > 
> > > However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy
> > > in that commit, so fix the issue by updating hctx->dispatch_busy in case
> > > of real scheduler.
> > > 
> > > Reported-by: Jan Kara <jack@xxxxxxx>
> > > Fixes: 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io")
> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > > ---
> > >  block/blk-mq.c | 3 ---
> > >  1 file changed, 3 deletions(-)
> > 
> > Looks good to me. You can add:
> > 
> > Reviewed-by: Jan Kara <jack@xxxxxxx>
> > 
> > BTW: Do you plan to submit also your improvement to
> > __blk_mq_do_dispatch_sched() to update dispatch_busy during the fetching
> > requests from the scheduler to avoid draining all requests from the IO
> > scheduler?
> 
> I understand that kind of change isn't needed. When more requests are
> dequeued, hctx->dispatch_busy will be updated, then __blk_mq_do_dispatch_sched()
> won't dequeue at batch any more if either .queue_rq() returns
> STS_RESOURCE or running out of driver tag/budget.
> 
> Or do you still see related issues after this patch is applied?

I was suspicious that __blk_mq_do_dispatch_sched() would be still pulling
requests too aggressively from the IO scheduler (which effectively defeats
impact of cgroup IO weights on observed throughput). Now I did a few more
experiments with the workload doing multiple iterations for each kernel and
comparing ratios of achieved throughput when cgroup weights were in 2:1
ratio.

With this patch alone, I've got no significant distinction between IO from
two cgroups in 4 out of 5 test iterations. With your patch to update
max_dispatch in __blk_mq_do_dispatch_sched() applied on top the results
were not significantly different (my previous test result was likely a
lucky chance). With my original patch to allocate driver tags early in
__blk_mq_do_dispatch_sched() I get reliable distinction between cgroups -
the worst ratio from all the iterations is 1.4, average ratio is ~1.75.
This last result is btw very similar to ratios I can see when using
virtio-scsi instead of virtio-blk for the backing storage which is kind of
natural because virtio-scsi ends up using the dispatch-budget logic of SCSI
subsystem. I'm not saying my patch is the right way to do things but it
clearly shows that __blk_mq_do_dispatch_sched() is still too aggressive
pulling requests out of the IO scheduler.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR