On 12/14/21 8:04 AM, Christoph Hellwig wrote: > On Tue, Dec 14, 2021 at 07:53:46AM -0700, Jens Axboe wrote: >> Dexuan reports that he's seeing spikes of very heavy CPU utilization when >> running 24 disks and using the 'none' scheduler. This happens off the >> flush path, because SCSI requires the queue to be restarted async, and >> hence we're hammering on mod_delayed_work_on() to ensure that the work >> item gets run appropriately. >> >> What we care about here is that the queue is run, and we don't need to >> repeatedly re-arm the timer associated with the delayed work item. If we >> check if the work item is pending upfront, then we don't really need to do >> anything else. This is safe as theh work pending bit is cleared before a >> work item is started. >> >> The only potential caveat here is if we have callers with wildly different >> timeouts specified. That's generally not the case, so don't think we need >> to care for that case. > > So why not do a non-delayed queue_work for that case? Might be good > to get the scsi and workqueue maintaines involved to understand the > issue a bit better first. We can probably get by with doing just that, and just ignore if a delayed work timer is already running. Dexuan, can you try this one? diff --git a/block/blk-core.c b/block/blk-core.c index 1378d084c770..c1833f95cb97 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1484,6 +1484,8 @@ EXPORT_SYMBOL(kblockd_schedule_work); int kblockd_mod_delayed_work_on(int cpu, struct delayed_work *dwork, unsigned long delay) { + if (!delay) + return queue_work_on(cpu, kblockd_workqueue, &dwork->work); return mod_delayed_work_on(cpu, kblockd_workqueue, dwork, delay); } EXPORT_SYMBOL(kblockd_mod_delayed_work_on); -- Jens Axboe