On Mon, 2022-02-14 at 09:50 -0500, John Pittman wrote: > This patch has now been tested in the customer environment and > results > were good (fixed the hangs). > > On Mon, Feb 7, 2022 at 9:45 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > On Tue, Feb 1, 2022 at 4:34 AM David Jeffery <djeffery@xxxxxxxxxx> > > wrote: > > > > > > When blk_mq_delay_run_hw_queues sets an hctx to run in the > > > future, it can > > > reset the delay length for an already pending delayed work > > > run_work. This > > > creates a scenario where multiple hctx may have their queues set > > > to run, > > > but if one runs first and finds nothing to do, it can reset the > > > delay of > > > another hctx and stall the other hctx's ability to run requests. > > > > > > To avoid this I/O stall when an hctx's run_work is already > > > pending, > > > leave it untouched to run at its current designated time rather > > > than > > > extending its delay. The work will still run which keeps closed > > > the race > > > calling blk_mq_delay_run_hw_queues is needed for while also > > > avoiding the > > > I/O stall. > > > Hello > > > Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx> > > > --- > > > block/blk-mq.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > > index f3bf3358a3bb..ae46eb4bf547 100644 > > > --- a/block/blk-mq.c > > > +++ b/block/blk-mq.c > > > @@ -2177,6 +2177,14 @@ void blk_mq_delay_run_hw_queues(struct > > > request_queue *q, unsigned long msecs) > > > queue_for_each_hw_ctx(q, hctx, i) { > > > if (blk_mq_hctx_stopped(hctx)) > > > continue; > > > + /* > > > + * If there is already a run_work pending, leave > > > the > > > + * pending delay untouched. Otherwise, a hctx can > > > stall > > > + * if another hctx is re-delaying the other's > > > work > > > + * before the work executes. > > > + */ > > > + if (delayed_work_pending(&hctx->run_work)) > > > + continue; > > > > The issue is triggered on BFQ, since BFQ's has_work() may return > > true, > > however its ->dispatch_request() may return NULL, so > > blk_mq_delay_run_hw_queues() > > is run for delay schedule. > > > > In case of multiple hw queue, the described issue may be triggered, > > and cause io > > stall for long time. And there are only 3 in-tree callers of > > blk_mq_delay_run_hw_queues(), > > David's fix works well for the 3 users, so this patch looks fine: > > > > Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > > Thanks, > > > > Hello Jens, gentle ping, can we get this in please Sincerely Laurence and the RH team