Re: [PATCH] blk-mq: avoid extending delays of active hctx from blk_mq_delay_run_hw_queues

Laurence Oberman <loberman@xxxxxxxxxx> · Tue, 01 Feb 2022 08:39:31 -0500

On Mon, 2022-01-31 at 15:33 -0500, David Jeffery wrote:
> When blk_mq_delay_run_hw_queues sets an hctx to run in the future, it
> can
> reset the delay length for an already pending delayed work run_work.
> This
> creates a scenario where multiple hctx may have their queues set to
> run,
> but if one runs first and finds nothing to do, it can reset the delay
> of
> another hctx and stall the other hctx's ability to run requests.
> 
> To avoid this I/O stall when an hctx's run_work is already pending,
> leave it untouched to run at its current designated time rather than
> extending its delay. The work will still run which keeps closed the
> race
> calling blk_mq_delay_run_hw_queues is needed for while also avoiding
> the
> I/O stall.
> 
> Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
> ---
>  block/blk-mq.c |    8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f3bf3358a3bb..ae46eb4bf547 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2177,6 +2177,14 @@ void blk_mq_delay_run_hw_queues(struct
> request_queue *q, unsigned long msecs)
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		if (blk_mq_hctx_stopped(hctx))
>  			continue;
> +		/*
> +		 * If there is already a run_work pending, leave the
> +		 * pending delay untouched. Otherwise, a hctx can stall
> +		 * if another hctx is re-delaying the other's work
> +		 * before the work executes.
> +		 */
> +		if (delayed_work_pending(&hctx->run_work))
> +			continue;
>  		/*
>  		 * Dispatch from this hctx either if there's no hctx
> preferred
>  		 * by IO scheduler or if it has requests that bypass
> the
> 

Ming is aware of this patch and had asked David to submit it.
David already explained his reasoning internally.
It's for an already reported issue by a customer.

Reviewed-by:
Laurence Oberman <loberman@xxxxxxxxxx>