On Mon 31-05-21 09:37:01, Ming Lei wrote: > On Fri, May 28, 2021 at 02:26:31PM +0200, Jan Kara wrote: > > On Fri 28-05-21 11:20:55, Ming Lei wrote: > > > Commit 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io") > > > starts to support io batching submission by using hctx->dispatch_busy. > > > > > > However, blk_mq_update_dispatch_busy() isn't changed to update hctx->dispatch_busy > > > in that commit, so fix the issue by updating hctx->dispatch_busy in case > > > of real scheduler. > > > > > > Reported-by: Jan Kara <jack@xxxxxxx> > > > Fixes: 6e6fcbc27e77 ("blk-mq: support batching dispatch in case of io") > > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > --- > > > block/blk-mq.c | 3 --- > > > 1 file changed, 3 deletions(-) > > > > Looks good to me. You can add: > > > > Reviewed-by: Jan Kara <jack@xxxxxxx> > > > > BTW: Do you plan to submit also your improvement to > > __blk_mq_do_dispatch_sched() to update dispatch_busy during the fetching > > requests from the scheduler to avoid draining all requests from the IO > > scheduler? > > I understand that kind of change isn't needed. When more requests are > dequeued, hctx->dispatch_busy will be updated, then __blk_mq_do_dispatch_sched() > won't dequeue at batch any more if either .queue_rq() returns > STS_RESOURCE or running out of driver tag/budget. > > Or do you still see related issues after this patch is applied? I was suspicious that __blk_mq_do_dispatch_sched() would be still pulling requests too aggressively from the IO scheduler (which effectively defeats impact of cgroup IO weights on observed throughput). Now I did a few more experiments with the workload doing multiple iterations for each kernel and comparing ratios of achieved throughput when cgroup weights were in 2:1 ratio. With this patch alone, I've got no significant distinction between IO from two cgroups in 4 out of 5 test iterations. With your patch to update max_dispatch in __blk_mq_do_dispatch_sched() applied on top the results were not significantly different (my previous test result was likely a lucky chance). With my original patch to allocate driver tags early in __blk_mq_do_dispatch_sched() I get reliable distinction between cgroups - the worst ratio from all the iterations is 1.4, average ratio is ~1.75. This last result is btw very similar to ratios I can see when using virtio-scsi instead of virtio-blk for the backing storage which is kind of natural because virtio-scsi ends up using the dispatch-budget logic of SCSI subsystem. I'm not saying my patch is the right way to do things but it clearly shows that __blk_mq_do_dispatch_sched() is still too aggressive pulling requests out of the IO scheduler. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR