Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle

Bart Van Assche <Bart.VanAssche@xxxxxxx> · Thu, 18 Jan 2018 17:20:57 +0000

On Thu, 2018-01-18 at 12:03 -0500, Mike Snitzer wrote:
> On Thu, Jan 18 2018 at 11:50am -0500,
> Bart Van Assche <bart.vanassche@xxxxxxx> wrote:
> > My comments about the above are as follows:
> > - It can take up to q->rq_timeout jiffies after a .queue_rq()
> >   implementation returned BLK_STS_RESOURCE before blk_mq_timeout_work()
> >   gets called. However, it can happen that only a few milliseconds after
> >   .queue_rq() returned BLK_STS_RESOURCE that the condition that caused
> >   it to return BLK_STS_RESOURCE gets cleared. So the above approach can
> >   result in long delays during which it will seem like the queue got
> >   stuck. Additionally, I think that the block driver should decide how
> >   long it takes before a queue is rerun and not the block layer core.
> 
> So configure q->rq_timeout to be shorter?  Which is configurable though
> blk_mq_tag_set's 'timeout' member.  It apparently defaults to 30 * HZ.
> 
> That is the problem with timeouts, there is generally no one size fits
> all.

Sorry but I think that would be wrong. The delay after which a queue is rerun
should not be coupled to the request timeout. These two should be independent.

> > - The lockup that I reported only occurs with the dm driver but not any
> >   other block driver. So why to modify the block layer core since this
> >   can be fixed by modifying the dm driver?
> 
> Hard to know it is only DM's blk-mq that is impacted.  That is the only
> blk-mq driver that you're testing like this (that is also able to handle
> faults, etc).

That's not correct. I'm also testing the SCSI core, which is one of the most
complicated block drivers.

> > - A much simpler fix and a fix that is known to work exists, namely
> >   inserting a blk_mq_delay_run_hw_queue() call in the dm driver.
> 
> Because your "much simpler" fix actively hurts performance, as is
> detailed in this header:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=ec3eaf9a673106f66606896aed6ddd20180b02ec

We are close to the start of the merge window so I think it's better to fall
back to an old approach that is known to work than to keep a new approach
that is known not to work. Additionally, the performance issue you referred
to only affects IOPS and bandwidth more than 1% with the lpfc driver and that
is because the queue depth it supports is much lower than for other SCSI HBAs,
namely 3 instead of 64.

Thanks,

Bart.