Hello Xiongfeng, On Tue, Feb 15, 2022 at 10:29:51AM +0800, Xiongfeng Wang wrote: > Hi Ming, > > Sorry to disturb you. It's just that I think you may be interested at this > patch. I found the following commit written by you. > commit 11ea68f553e244851d15793a7fa33a97c46d8271 > genirq, sched/isolation: Isolate from handling managed interrupts > It removed the managed_irq interruption from non-housekeeping CPUs as long as > the non-housekeeping CPUs do not request IO. But the the work thread > blk_mq_run_work_fn() may still run on the non-housekeeping CPUs. > Appreciate it a lot if you can give it a look. Yeah, commit 11ea68f553e24 touches irq subsystem to try not assign isolated cpus for managed irq's effective affinity. Here blk-mq just selects one cpu and calls mod_delayed_work_on() to execute the run queue handler on specified cpu. There are lots of such bound wq usage in tree, so I guess it might belong to one wq or scheduler generic problem instead of blk-mq specific issue. Not sure if it is good to address it in block layer. thanks, Ming > > Thanks, > Xiongfeng > > On 2022/2/10 17:35, Xiongfeng Wang wrote: > > When NOHZ_FULL is enabled, such as in HPC situation, CPUs are divided > > into housekeeping CPUs and non-housekeeping CPUs. Non-housekeeping CPUs > > are NOHZ_FULL CPUs and are often monopolized by the userspace process, > > such HPC application process. Any sort of interruption is not expected. > > > > blk_mq_hctx_next_cpu() selects each cpu in 'hctx->cpumask' alternately > > to schedule the work thread blk_mq_run_work_fn(). When 'hctx->cpumask' > > contains housekeeping CPU and non-housekeeping CPU at the same time, a > > housekeeping CPU, which want to request a IO, may schedule a worker on a > > non-housekeeping CPU. This may affect the performance of the userspace > > application running on non-housekeeping CPUs. > > > > So let's just schedule the worker thread on the current CPU when the > > current CPU is housekeeping CPU. > > > > Signed-off-by: Xiongfeng Wang <wangxiongfeng2@xxxxxxxxxx> > > --- > > block/blk-mq.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 1adfe4824ef5..ff9a4bf16858 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -24,6 +24,7 @@ > > #include <linux/sched/sysctl.h> > > #include <linux/sched/topology.h> > > #include <linux/sched/signal.h> > > +#include <linux/sched/isolation.h> > > #include <linux/delay.h> > > #include <linux/crash_dump.h> > > #include <linux/prefetch.h> > > @@ -2036,6 +2037,8 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > > static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, > > unsigned long msecs) > > { > > + int work_cpu; > > + > > if (unlikely(blk_mq_hctx_stopped(hctx))) > > return; > > > > @@ -2050,7 +2053,17 @@ static void __blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async, > > put_cpu(); > > } > > > > - kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, > > + /* > > + * Avoid housekeeping CPUs scheduling a worker on a non-housekeeping > > + * CPU > > + */ > > + if (tick_nohz_full_enabled() && housekeeping_cpu(smp_processor_id(), > > + HK_FLAG_WQ)) > > + work_cpu = smp_processor_id(); > > + else > > + work_cpu = blk_mq_hctx_next_cpu(hctx); > > + > > + kblockd_mod_delayed_work_on(work_cpu, &hctx->run_work, > > msecs_to_jiffies(msecs)); > > } > > > > > -- Ming