Re: [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>
> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>
> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
> reading their own drive and all pinned to the same CPU to make it CPU
> bound and got +10% throughput improvement.
>
> [1] https://lore.kernel.org/all/cover.1680782016.git.asml.silence@xxxxxxxxx/
>
> Pavel Begunkov (2):
>   io_uring/cmd: add cmd lazy tw wake helper
>   nvme: optimise io_uring passthrough completion
>
>  drivers/nvme/host/ioctl.c |  4 ++--
>  include/linux/io_uring.h  | 18 ++++++++++++++++--
>  io_uring/uring_cmd.c      | 16 ++++++++++++----
>  3 files changed, 30 insertions(+), 8 deletions(-)
>
>
> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
> --
> 2.40.0
>

I tried to run a few workloads on my setup with your patches applied. However, I
couldn't see any difference in io passthrough performance. I might have missed
something. Can you share the workload that you ran which gave you the perf
improvement. Here is the workload that I ran -

Without your patches applied -

# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2049, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1382MiB/s, IOS/call=16/15
IOPS=2.82M, BW=1379MiB/s, IOS/call=16/16
IOPS=2.84M, BW=1388MiB/s, IOS/call=16/15
Exiting on timeout
Maximum IOPS=2.84M

# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=0, tid=2046, file=/dev/ng0n1, node=-1
submitter=1, tid=2047, file=/dev/ng1n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.72M, BW=2.79GiB/s, IOS/call=16/15
IOPS=5.71M, BW=2.79GiB/s, IOS/call=16/16
IOPS=5.70M, BW=2.78GiB/s, IOS/call=16/15
Exiting on timeout Maximum IOPS=5.72M

With your patches applied -

# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2032, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1381MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1379MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1383MiB/s, IOS/call=15/15
Exiting on timeout Maximum IOPS=2.83M

# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=1, tid=2037, file=/dev/ng1n1, node=-1
submitter=0, tid=2036, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.64M, BW=2.75GiB/s, IOS/call=15/15
IOPS=5.62M, BW=2.75GiB/s, IOS/call=16/16
IOPS=5.62M, BW=2.74GiB/s, IOS/call=16/16
Exiting on timeout Maximum IOPS=5.64M

--
Anuj Gupta




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux