On 14/12/2021 20:49, Jens Axboe wrote:
Dexuan reports that he's seeing spikes of very heavy CPU utilization when
running 24 disks and using the 'none' scheduler. This happens off the
sched restart path, because SCSI requires the queue to be restarted async,
and hence we're hammering on mod_delayed_work_on() to ensure that the work
item gets run appropriately.
Avoid hammering on the timer and just use queue_work_on() if no delay
has been specified.
Reported-and-tested-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
Link: https://lore.kernel.org/linux-block/BYAPR21MB1270C598ED214C0490F47400BF719@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
diff --git a/block/blk-core.c b/block/blk-core.c
index 1378d084c770..c1833f95cb97 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work);
int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork,
unsigned long delay)
{
+ if (!delay)
+ return queue_work_on(cpu, kblockd_workqueue, &dwork->work);
return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay);
}
EXPORT_SYMBOL(kblockd_mod_delayed_work_on);
Hi Jens,
I have a related comment on the current code and interface it uses, if
you don't mind, as I did wonder if we are doing a msec_to_jiffies(0 [not
built-in const]) call somewhere.
So we pass msecs to blk-mq.c, and we do a msec_to_jiffies() call on it
before calling kblockd_mod_delayed_work_on(). Now most/all callsites
uses const value for the msec value, so if we did the msec_to_jiffies()
conversion at the callsites and passed a jiffies value, it should be
compiled out by gcc. This is my current __blk_mq_delay_run_hw_queue
assembler:
0000000000001ef0 <__blk_mq_delay_run_hw_queue>:
[snip]
2024: a942dfb6 ldp x22, x23, [x29, #40]
2028: 2a1503e0 mov w0, w21
202c: 94000000 bl 0 <__msecs_to_jiffies>
kblockd_mod_delayed_work_on(blk_mq_hctx_next_cpu(hctx), &hctx->run_work,
2030: aa0003e2 mov x2, x0
2034: 91010261 add x1, x19, #0x40
2038: 2a1403e0 mov w0, w20
203c: 94000000 bl 0 <kblockd_mod_delayed_work_on>
I'm not sure if you would want to change so many APIs or if jiffies is
sensible to pass or even any performance gain. Additionally Function
blk_mq_delay_kick_requeue_list() would not see so much gain in such a
change as msec value is not const. Any thoughts? Maybe testing
performance would not do much harm.
Thanks,
John