On 11/23/23 11:12 AM, Kanchan Joshi wrote: > On 11/23/2023 9:00 PM, Christoph Hellwig wrote: >> The rest looks good, but that stats overhead seems pretty horrible.. > > On my setup > Before[1]: 7.06M > After[2]: 8.29M > > [1] > # taskset -c 2,3 t/io_uring -b512 -d256 -c32 -s32 -p1 -F1 -B1 -O0 -n2 > -u1 -r4 /dev/ng0n1 /dev/ng1n1 > submitter=0, tid=2076, file=/dev/ng0n1, node=-1 > submitter=1, tid=2077, file=/dev/ng1n1, node=-1 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > IOPS=6.95M, BW=3.39GiB/s, IOS/call=32/31 > IOPS=7.06M, BW=3.45GiB/s, IOS/call=32/32 > IOPS=7.06M, BW=3.45GiB/s, IOS/call=32/31 > Exiting on timeout > Maximum IOPS=7.06M > > [2] > # taskset -c 2,3 t/io_uring -b512 -d256 -c32 -s32 -p1 -F1 -B1 -O0 -n2 > -u1 -r4 /dev/ng0n1 /dev/ng1n1 > submitter=0, tid=2123, file=/dev/ng0n1, node=-1 > submitter=1, tid=2124, file=/dev/ng1n1, node=-1 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > IOPS=8.27M, BW=4.04GiB/s, IOS/call=32/31 > IOPS=8.29M, BW=4.05GiB/s, IOS/call=32/31 > IOPS=8.29M, BW=4.05GiB/s, IOS/call=31/31 > Exiting on timeout > Maximum IOPS=8.29M It's all really down to how expensive getting the current time is on your box, some will be better and some worse One idea that has been bounced around in the past is to have a blk_ktime_get_ns() and have it be something ala: u64 blk_ktime_get_ns(void) { struct blk_plug *plug = current->plug; if (!plug) return ktime_get_ns(); if (!plug->ktime_valid) plug->ktime = ktime_get_ns(); return plug->ktime; } in freestyle form, with the idea being that we don't care granularity to the extent that we'd need a new stamp every time. If the task is scheduled out, the plug is flushed anyway, which should invalidate the stamp. For preemption this isn't true iirc, so we'd need some kind of blk_flush_plug_ts() or something for that case to invalidate it. Hopefully this could then also get away from passing in a cached value that we do in various spots, exactly because all of this time stamping is expensive. It's also a bit of a game of whack-a-mole, as users get added and distro kernels tend to turn on basically everything anyway. -- Jens Axboe