On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough. > > The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1] > patchset, but for a quick test I took fio/t/io_uring with 4 threads each > reading their own drive and all pinned to the same CPU to make it CPU > bound and got +10% throughput improvement. > > [1] https://lore.kernel.org/all/cover.1680782016.git.asml.silence@xxxxxxxxx/ > > Pavel Begunkov (2): > io_uring/cmd: add cmd lazy tw wake helper > nvme: optimise io_uring passthrough completion > > drivers/nvme/host/ioctl.c | 4 ++-- > include/linux/io_uring.h | 18 ++++++++++++++++-- > io_uring/uring_cmd.c | 16 ++++++++++++---- > 3 files changed, 30 insertions(+), 8 deletions(-) > > > base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6 > -- > 2.40.0 > I tried to run a few workloads on my setup with your patches applied. However, I couldn't see any difference in io passthrough performance. I might have missed something. Can you share the workload that you ran which gave you the perf improvement. Here is the workload that I ran - Without your patches applied - # taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0 -u1 -n1 /dev/ng0n1 submitter=0, tid=2049, file=/dev/ng0n1, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=2.83M, BW=1382MiB/s, IOS/call=16/15 IOPS=2.82M, BW=1379MiB/s, IOS/call=16/16 IOPS=2.84M, BW=1388MiB/s, IOS/call=16/15 Exiting on timeout Maximum IOPS=2.84M # taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1 submitter=0, tid=2046, file=/dev/ng0n1, node=-1 submitter=1, tid=2047, file=/dev/ng1n1, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=5.72M, BW=2.79GiB/s, IOS/call=16/15 IOPS=5.71M, BW=2.79GiB/s, IOS/call=16/16 IOPS=5.70M, BW=2.78GiB/s, IOS/call=16/15 Exiting on timeout Maximum IOPS=5.72M With your patches applied - # taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0 -u1 -n1 /dev/ng0n1 submitter=0, tid=2032, file=/dev/ng0n1, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=2.83M, BW=1381MiB/s, IOS/call=16/15 IOPS=2.83M, BW=1379MiB/s, IOS/call=16/15 IOPS=2.83M, BW=1383MiB/s, IOS/call=15/15 Exiting on timeout Maximum IOPS=2.83M # taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1 submitter=1, tid=2037, file=/dev/ng1n1, node=-1 submitter=0, tid=2036, file=/dev/ng0n1, node=-1 polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64 Engine=io_uring, sq_ring=64, cq_ring=64 IOPS=5.64M, BW=2.75GiB/s, IOS/call=15/15 IOPS=5.62M, BW=2.75GiB/s, IOS/call=16/16 IOPS=5.62M, BW=2.74GiB/s, IOS/call=16/16 Exiting on timeout Maximum IOPS=5.64M -- Anuj Gupta