On 11/24/23 4:28 AM, Kundan Kumar wrote: > This is what I see with your changes on my setup. > > Before[1]: 7.06M > After[2]: 7.52M > > [1] > # taskset -c 2,3 t/io_uring -b512 -d256 -c32 -s32 -p1 -F1 -B1 -O0 -n2 \ > -u1 -r4 /dev/ng0n1 /dev/ng1n1 > submitter=0, tid=2076, file=/dev/ng0n1, node=-1 > submitter=1, tid=2077, file=/dev/ng1n1, node=-1 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > IOPS=6.95M, BW=3.39GiB/s, IOS/call=32/31 > IOPS=7.06M, BW=3.45GiB/s, IOS/call=32/32 > IOPS=7.06M, BW=3.45GiB/s, IOS/call=32/31 > Exiting on timeout > Maximum IOPS=7.06M > > [2] > # taskset -c 2,3 t/io_uring -b512 -d256 -c32 -s32 -p1 -F1 -B1 -O0 -n2 \ > -u1 -r4 /dev/ng0n1 /dev/ng1n1 > submitter=0, tid=2204, file=/dev/ng0n1, node=-1 > submitter=1, tid=2205, file=/dev/ng1n1, node=-1 > polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=256 > Engine=io_uring, sq_ring=256, cq_ring=256 > IOPS=7.40M, BW=3.62GiB/s, IOS/call=32/31 > IOPS=7.51M, BW=3.67GiB/s, IOS/call=32/31 > IOPS=7.52M, BW=3.67GiB/s, IOS/call=32/32 > Exiting on timeout > Maximum IOPS=7.52M > > The original patch avoids processing throttle stats and wbt_issue/done > stats for passthrough-io path. > > Improvement with original-patch : > 7.06M -> 8.29M > > It seems that both the optimizations are different. The original patch is about > "completely disabling stats for passthrough-io" and your changes > optimize getting the > current time which would improve performance for everyone. > > I think both of them are independent. Yes they are, mine is just a general "we should do something like this rather than play whack-a-mole on the issue side for time stamping". It doesn't solve the completion side, which is why your patch is better for passthrough as a whole. I do think we should add your patch, they are orthogonal. Did you send out a v2 we can queue up for 6.8? -- Jens Axboe