On 9/2/22 3:25 PM, Jens Axboe wrote: > On 9/2/22 1:32 PM, Jens Axboe wrote: >> On 9/2/22 12:46 PM, Kanchan Joshi wrote: >>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote: >>>> On 9/2/22 10:06 AM, Jens Axboe wrote: >>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote: >>>>>> Hi, >>>>>> >>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered >>>>>> buffers. This series adds the support in uring-cmd, and plumbs >>>>>> nvme passthrough to work with it. >>>>>> >>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS >>>>>> in my setup. >>>>>> >>>>>> Without fixedbufs >>>>>> ***************** >>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 >>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1 >>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128 >>>>>> Engine=io_uring, sq_ring=128, cq_ring=128 >>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31 >>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32 >>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32 >>>>>> ^CExiting on signal >>>>>> Maximum IOPS=1.85M >>>>> >>>>> With the poll support queued up, I ran this one as well. tldr is: >>>>> >>>>> bdev (non pt)??? 122M IOPS >>>>> irq driven??? 51-52M IOPS >>>>> polled??????? 71M IOPS >>>>> polled+fixed??? 78M IOPS Followup on this, since t/io_uring didn't correctly detect NUMA nodes for passthrough. With the current tree and the patchset I just sent for iopoll and the caching fix that's in the block tree, here's the final score: polled+fixed passthrough 105M IOPS which is getting pretty close to the bdev polled fixed path as well. I think that is starting to look pretty good! [...] submitter=22, tid=4768, file=/dev/ng22n1, node=8 submitter=23, tid=4769, file=/dev/ng23n1, node=8 polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=102.51M, BW=50.05GiB/s, IOS/call=32/31 IOPS=105.29M, BW=51.41GiB/s, IOS/call=31/32 IOPS=105.34M, BW=51.43GiB/s, IOS/call=32/31 IOPS=105.37M, BW=51.45GiB/s, IOS/call=32/32 IOPS=105.37M, BW=51.45GiB/s, IOS/call=31/31 IOPS=105.38M, BW=51.45GiB/s, IOS/call=31/31 IOPS=105.35M, BW=51.44GiB/s, IOS/call=32/32 IOPS=105.49M, BW=51.51GiB/s, IOS/call=32/31 ^CExiting on signal Maximum IOPS=105.49M -- Jens Axboe