On 12/7/22 5:35?PM, Keith Busch wrote: > On Wed, Dec 07, 2022 at 11:17:12PM +0000, Chaitanya Kulkarni wrote: >> On 12/7/22 15:08, Jens Axboe wrote: >>> >>> My default peak testing runs at 122M IOPS. That's also the peak IOPS of >>> the devices combined, and with iostats disabled. If I enabled iostats, >>> then the performance drops to 112M IOPS. It's no longer device limited, >>> that's a drop of about 8.2%. >>> >> >> Wow, clearly not acceptable that's exactly I asked for perf >> numbers :). > > For the record, we did say per-io ktime_get() has a measurable > performance harm and should be aggregated. > > https://www.spinics.net/lists/linux-block/msg89937.html Yes, I iterated that in the v1 posting as well, and mentioned it was the reason the time batching was done. From the results I posted, if you look at a profile of the run, here are the time related additions: + 27.22% io_uring [kernel.vmlinux] [k] read_tsc + 4.37% io_uring [kernel.vmlinux] [k] ktime_get which are #1 and $4, respectively. That's a LOT of added overhead. Not sure why people think time keeping is free, particularly high granularity time keeping. It's definitely not, and adding 2-3 per IO is very noticeable. -- Jens Axboe