Re: [bug report] shared tags causes IO hang and performance drop

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 26 Apr 2021 22:48:53 +0800

On Mon, Apr 26, 2021 at 11:53:45AM +0100, John Garry wrote:
> On 23/04/2021 09:43, John Garry wrote:
> > > 1) randread test on ibm-x3850x6[*] with deadline
> > > 
> > >                |IOPS    | FIO CPU util
> > > ------------------------------------------------
> > > hosttags      | 94k    | usr=1.13%, sys=14.75%
> > > ------------------------------------------------
> > > non hosttags  | 124k   | usr=1.12%, sys=10.65%,
> > > 
> > 
> > Getting these results for mq-deadline:
> > 
> > hosttags
> > 100K cpu 1.52 4.47
> > 
> > non-hosttags
> > 109K cpu 1.74 5.49
> > 
> > So I still don't see the same CPU usage increase for hosttags.
> > 
> > But throughput is down, so at least I can check on that...
> > 
> > > 
> > > 2) randread test on ibm-x3850x6[*] with none
> > >                |IOPS    | FIO CPU util
> > > ------------------------------------------------
> > > hosttags      | 120k   | usr=0.89%, sys=6.55%
> > > ------------------------------------------------
> > > non hosttags  | 121k   | usr=1.07%, sys=7.35%
> > > ------------------------------------------------
> > > 
> > 
> > Here I get:
> > hosttags
> > 113K cpu 2.04 5.83
> > 
> > non-hosttags
> > 108K cpu 1.71 5.05
> 
> Hi Ming,
> 
> One thing I noticed is that for the non-hosttags scenario is that I am
> hitting the IO scheduler tag exhaustion path in blk_mq_get_tag() often;
> here's some perf output:
> 
> |--15.88%--blk_mq_submit_bio
> |     |
> |     |--11.27%--__blk_mq_alloc_request
> |     |      |
> |     |       --11.19%--blk_mq_get_tag
> |     |      |
> |     |      |--6.00%--__blk_mq_delay_run_hw_queue
> |     |      |     |
> 
> ...
> 
> |     |      |
> |     |      |--3.29%--io_schedule
> |     |      |     |
> 
> ....
> 
> |     |      |     |
> |     |      |     --1.32%--io_schedule_prepare
> |     |      |
> 
> ...
> 
> |     |      |
> |     |      |--0.60%--sbitmap_finish_wait
> |     |      |
>      --0.56%--sbitmap_get
> 
> I don't see this for hostwide tags - this may be because we have multiple
> hctx, and the IO sched tags are per hctx, so less chance of exhaustion. But
> this is not from hostwide tags specifically, but for multiple HW queues in
> general. As I understood, sched tags were meant to be per request queue,
> right? I am reading this correctly?

sched tags is still per-hctx.

I just found that you didn't change sched tags into per-request-queue
shared tags. Then for hostwide tags, each hctx still has its own
standalone sched tags and request pool, that is one big difference with
non hostwide tags. That is why you observe that scheduler tag exhaustion
is easy to trigger in case of non-hostwide tags.

I'd suggest to add one per-request-queue sched tags, and make all hctxs
sharing it, just like what you did for driver tag.

Thanks,
Ming