On 12/15/21 3:25 AM, John Garry wrote: > On 14/12/2021 20:49, Jens Axboe wrote: >> Dexuan reports that he's seeing spikes of very heavy CPU utilization when >> running 24 disks and using the 'none' scheduler. This happens off the >> sched restart path, because SCSI requires the queue to be restarted async, >> and hence we're hammering on mod_delayed_work_on() to ensure that the work >> item gets run appropriately. >> >> Avoid hammering on the timer and just use queue_work_on() if no delay >> has been specified. >> >> Reported-and-tested-by: Dexuan Cui <decui@xxxxxxxxxxxxx> >> Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> >> --- >> >> diff --git a/block/blk-core.c b/block/blk-core.c >> index 1378d084c770..c1833f95cb97 100644 >> --- a/block/blk-core.c >> +++ b/block/blk-core.c >> @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); >> int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, >> unsigned long delay) >> { >> + if (!delay) >> + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); >> return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); >> } >> EXPORT_SYMBOL(kblockd_mod_delayed_work_on); >> > > Hi Jens, > > I have a related comment on the current code and interface it uses, if > you don't mind, as I did wonder if we are doing a msec_to_jiffies(0 [not > built-in const]) call somewhere. > > So we pass msecs to blk-mq.c, and we do a msec_to_jiffies() call on it > before calling kblockd_mod_delayed_work_on(). Now most/all callsites > uses const value for the msec value, so if we did the msec_to_jiffies() > conversion at the callsites and passed a jiffies value, it should be > compiled out by gcc. This is my current __blk_mq_delay_run_hw_queue > assembler: > > 0000000000001ef0 <__blk_mq_delay_run_hw_queue>: > [snip] > 2024: a942dfb6 ldp x22, x23, [x29, #40] > 2028: 2a1503e0 mov w0, w21 > 202c: 94000000 bl 0 <__msecs_to_jiffies> > kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work, > 2030: aa0003e2 mov x2, x0 > 2034: 91010261 add x1, x19, #0x40 > 2038: 2a1403e0 mov w0, w20 > 203c: 94000000 bl 0 <kblockd_mod_delayed_work_on> > > I'm not sure if you would want to change so many APIs or if jiffies is > sensible to pass or even any performance gain. Additionally Function > blk_mq_delay_kick_requeue_list() would not see so much gain in such a > change as msec value is not const. Any thoughts? Maybe testing > performance would not do much harm. In general I totally agree with you, it'd be smarter to flip the conversion so it can be done in a more efficient manner. At the same time, the queue delay running is not at all a fast path, so shouldn't really matter in practice. -- Jens Axboe