Re: [bug report] shared tags causes IO hang and performance drop

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 27 Apr 2021 17:52:58 +0800

On Tue, Apr 27, 2021 at 10:37:39AM +0100, John Garry wrote:
> On 27/04/2021 10:11, Ming Lei wrote:
> > On Tue, Apr 27, 2021 at 08:52:53AM +0100, John Garry wrote:
> > > On 27/04/2021 00:59, Ming Lei wrote:
> > > > > Anyway, I'll look at adding code for a per-request queue sched tags to see
> > > > > if it helps. But I would plan to continue to use a per hctx sched request
> > > > > pool.
> > > > Why not switch to per hctx sched request pool?
> > > I don't understand. The current code uses a per-hctx sched request pool, and
> > > I said that I don't plan to change that.
> > I forget why you didn't do that, because for hostwide tags, request
> > is always 1:1 for either sched tags(real io sched) or driver tags(none).
> > 
> > Maybe you want to keep request local to hctx, but never see related
> > performance data for supporting the point, sbitmap queue allocator has
> > been intelligent enough to allocate tag freed from native cpu.
> > 
> > Then you just waste lots of memory, I remember that scsi request payload
> > is a bit big.
> 
> It's true that we waste much memory for regular static requests for when
> using hostwide tags today.
> 
> One problem in trying to use a single set of "hostwide" static requests is
> that we call blk_mq_init_request(..., hctx_idx, ...) ->
> set->ops->init_request(.., hctx_idx, ...) for each static rq, and this would
> not work for a single set of "hostwide" requests.
> 
> And I see a similar problem for a "request queue-wide" sched static
> requests.
> 
> Maybe we can improve this in future.

OK, fair enough.

> 
> BTW, for the performance issue which Yanhui witnessed with megaraid sas, do
> you think it may because of the IO sched tags issue of total sched tag depth
> growing vs driver tags?

I think it is highly possible. Will you work a patch to convert to
per-request-queue sched tag?

> Are there lots of LUNs? I can imagine that megaraid
> sas has much larger can_queue than scsi_debug :)

No, there are just two LUNs, the 1st LUN is one commodity SSD(queue
depth is 32) and the performance issue is reported on this LUN, another is one
HDD(queue depth is 256) which is root disk, but the megaraid host tag depth is
228, another weird setting. But the issue still can be reproduced after we set
2nd LUN's depth as 64 for avoiding driver tag contention.

Thanks,
Ming