On 8/26/21 12:09 PM, Bart Van Assche wrote: > On 8/26/21 7:40 AM, Zhen Lei wrote: >> lock protection needs to be added only in dd_finish_request(), which >> is unlikely to cause significant performance side effects. > > Not sure the above is correct. Every new atomic instruction has a > measurable performance overhead. But I guess in this case that > overhead is smaller than the time needed to sum 128 per-CPU variables. perpcu counters only really work, if the summing is not in a hot path, or if the summing is just some "not zero" thing instead of a full sum. They just don't scale at all for even moderately sized systems. >> Tested on my 128-core board with two ssd disks. >> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others> >> Before: >> [183K/0/0 iops] >> [172K/0/0 iops] >> >> After: >> [258K/0/0 iops] >> [258K/0/0 iops] > > Nice work! > >> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests") > > Shouldn't the Fixes: tag be used only for patches that modify > functionality? I'm not sure it is appropriate to use this tag for > performance improvements. For a regression this big, I think it's the right thing. Anyone that may backport the original commit definitely should also get the followup fix. This isn't just a performance improvement, it's fixing a big performance regression. -- Jens Axboe