Re: [PATCH] block: limit request dispatch loop duration

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Wed, 16 Mar 2022 15:55:11 +0900

On 3/16/22 15:11, Shin'ichiro Kawasaki wrote:
> When IO requests are made continuously and the target block device
> handles requests faster than request arrival, the request dispatch loop
> keeps on repeating to dispatch the arriving requests very long time,
> more than a minute. Since the loop runs as a workqueue worker task, the
> very long loop duration triggers workqueue watchdog timeout and BUG [1].
> 
> To avoid the very long loop duration, break the loop periodically. When
> opportunity to dispatch requests still exists, check need_resched(). If
> need_resched() returns true, the dispatch loop already consumed its time
> slice, then reschedule the dispatch work and break the loop. With heavy
> IO load, need_resched() does not return true for 20~30 seconds. To cover
> such case, check time spent in the dispatch loop with jiffies. If more
> than 1 second is spent, reschedule the dispatch work and break the loop.
> 
> [1]
> 
> [  609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s!
> [  609.701820] Showing busy workqueues and worker pools:
> [  609.707915] workqueue events: flags=0x0
> [  609.712615]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [  609.712626]     pending: drm_fb_helper_damage_work [drm_kms_helper]
> [  609.712687] workqueue events_freezable: flags=0x4
> [  609.732943]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [  609.732952]     pending: pci_pme_list_scan
> [  609.732968] workqueue events_power_efficient: flags=0x80
> [  609.751947]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [  609.751955]     pending: neigh_managed_work
> [  609.752018] workqueue kblockd: flags=0x18
> [  609.769480]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
> [  609.769488]     in-flight: 1020:blk_mq_run_work_fn
> [  609.769498]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
> [  609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67
> [  639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s!
> [  639.909513] Showing busy workqueues and worker pools:
> [  639.915404] workqueue events: flags=0x0
> [  639.920197]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> [  639.920215]     pending: drm_fb_helper_damage_work [drm_kms_helper]
> [  639.920365] workqueue kblockd: flags=0x18
> [  639.939932]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
> [  639.939942]     in-flight: 1020:blk_mq_run_work_fn
> [  639.939955]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
> [  639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67
> 
> Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx # v5.10+
> Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/
> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx>
> Tested-by: Ming Lei <ming.lei@xxxxxxxxxx>
> ---
>  block/blk-mq-sched.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 55488ba97823..64941615befc 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
>  {
>  	int ret;
> +	unsigned long end = jiffies + HZ;
>  
>  	do {
>  		ret = __blk_mq_do_dispatch_sched(hctx);
> +		if (ret == 1 &&
> +		    (need_resched() || time_is_after_jiffies(end))) {
> +			blk_mq_delay_run_hw_queue(hctx, 0);
> +			break;

Minor nit: you could "return 1;" directly here.

> +		}
>  	} while (ret == 1);
>  
>  	return ret;

Otherwise, looks good.

Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx>

-- 
Damien Le Moal
Western Digital Research