On 9/2/22 9:16 AM, Kanchan Joshi wrote: > Hi, > > Currently uring-cmd lacks the ability to leverage the pre-registered > buffers. This series adds the support in uring-cmd, and plumbs > nvme passthrough to work with it. > > Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS > in my setup. > > Without fixedbufs > ***************** > # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 > submitter=0, tid=5256, file=/dev/ng0n1, node=-1 > polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128 > Engine=io_uring, sq_ring=128, cq_ring=128 > IOPS=1.85M, BW=904MiB/s, IOS/call=32/31 > IOPS=1.85M, BW=903MiB/s, IOS/call=32/32 > IOPS=1.85M, BW=902MiB/s, IOS/call=32/32 > ^CExiting on signal > Maximum IOPS=1.85M With the poll support queued up, I ran this one as well. tldr is: bdev (non pt) 122M IOPS irq driven 51-52M IOPS polled 71M IOPS polled+fixed 78M IOPS Looking at profiles, it looks like the bio is still being allocated and freed and not dipping into the alloc cache, which is using a substantial amount of CPU. I'll poke a bit and see what's going on... -- Jens Axboe