On Mon, 2022-01-31 at 15:33 -0500, David Jeffery wrote: > When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it > can > reset the delay length for an already pending delayed work run_work. > This > creates a scenario where multiple hctx may have their queues set to > run, > but if one runs first and finds nothing to do, it can reset the delay > of > another hctx and stall the other hctx's ability to run requests. > > To avoid this I/O stall when an hctx's run_work is already pending, > leave it untouched to run at its current designated time rather than > extending its delay. The work will still run which keeps closed the > race > calling blk_mq_delay_run_hw_queues is needed for while also avoiding > the > I/O stall. > > Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx> > --- > block/blk-mq.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index f3bf3358a3bb..ae46eb4bf547 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2177,6 +2177,14 @@ void blk_mq_delay_run_hw_queues(struct > request_queue *q, unsigned long msecs) > queue_for_each_hw_ctx(q, hctx, i) { > if (blk_mq_hctx_stopped(hctx)) > continue; > + /* > + * If there is already a run_work pending, leave the > + * pending delay untouched. Otherwise, a hctx can stall > + * if another hctx is re-delaying the other's work > + * before the work executes. > + */ > + if (delayed_work_pending(&hctx->run_work)) > + continue; > /* > * Dispatch from this hctx either if there's no hctx > preferred > * by IO scheduler or if it has requests that bypass > the > Ming is aware of this patch and had asked David to submit it. David already explained his reasoning internally. It's for an already reported issue by a customer. Reviewed-by: Laurence Oberman <loberman@xxxxxxxxxx>