On 3/16/22 15:11, Shin'ichiro Kawasaki wrote: > When IO requests are made continuously and the target block device > handles requests faster than request arrival, the request dispatch loop > keeps on repeating to dispatch the arriving requests very long time, > more than a minute. Since the loop runs as a workqueue worker task, the > very long loop duration triggers workqueue watchdog timeout and BUG [1]. > > To avoid the very long loop duration, break the loop periodically. When > opportunity to dispatch requests still exists, check need_resched(). If > need_resched() returns true, the dispatch loop already consumed its time > slice, then reschedule the dispatch work and break the loop. With heavy > IO load, need_resched() does not return true for 20~30 seconds. To cover > such case, check time spent in the dispatch loop with jiffies. If more > than 1 second is spent, reschedule the dispatch work and break the loop. > > [1] > > [ 609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s! > [ 609.701820] Showing busy workqueues and worker pools: > [ 609.707915] workqueue events: flags=0x0 > [ 609.712615] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.712626] pending: drm_fb_helper_damage_work [drm_kms_helper] > [ 609.712687] workqueue events_freezable: flags=0x4 > [ 609.732943] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.732952] pending: pci_pme_list_scan > [ 609.732968] workqueue events_power_efficient: flags=0x80 > [ 609.751947] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 609.751955] pending: neigh_managed_work > [ 609.752018] workqueue kblockd: flags=0x18 > [ 609.769480] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 > [ 609.769488] in-flight: 1020:blk_mq_run_work_fn > [ 609.769498] pending: blk_mq_timeout_work, blk_mq_run_work_fn > [ 609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67 > [ 639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s! > [ 639.909513] Showing busy workqueues and worker pools: > [ 639.915404] workqueue events: flags=0x0 > [ 639.920197] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2 > [ 639.920215] pending: drm_fb_helper_damage_work [drm_kms_helper] > [ 639.920365] workqueue kblockd: flags=0x18 > [ 639.939932] pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4 > [ 639.939942] in-flight: 1020:blk_mq_run_work_fn > [ 639.939955] pending: blk_mq_timeout_work, blk_mq_run_work_fn > [ 639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67 > > Fixes: 6e6fcbc27e778 ("blk-mq: support batching dispatch in case of io") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # v5.10+ > Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/ > Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> > Tested-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- > block/blk-mq-sched.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c > index 55488ba97823..64941615befc 100644 > --- a/block/blk-mq-sched.c > +++ b/block/blk-mq-sched.c > @@ -181,9 +181,15 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > static int blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx) > { > int ret; > + unsigned long end = jiffies + HZ; > > do { > ret = __blk_mq_do_dispatch_sched(hctx); > + if (ret == 1 && > + (need_resched() || time_is_after_jiffies(end))) { > + blk_mq_delay_run_hw_queue(hctx, 0); > + break; Minor nit: you could "return 1;" directly here. > + } > } while (ret == 1); > > return ret; Otherwise, looks good. Reviewed-by: Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> -- Damien Le Moal Western Digital Research