Hi, uring-cmd lacks the ability to leverage the pre-registered buffers. This series adds that support in uring-cmd, and plumbs nvme passthrough to work with it. Patches 3 - 5 carve out a block helper and scsi, nvme then use it to avoid duplication of code. Patch 6 and 7 contains a bunch of general nvme cleanups, which got added along the iterations. Using registered-buffers showed ~20% IOPS hike from 2.62M to 3.17M in my setup Without fixedbufs ***************** # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 submitter=0, tid=3623, file=/dev/ng0n1, node=-1 polled=1, fixedbufs=0/0, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=2.62M, BW=1281MiB/s, IOS/call=32/31 IOPS=2.62M, BW=1277MiB/s, IOS/call=32/32 IOPS=2.62M, BW=1277MiB/s, IOS/call=32/32 IOPS=2.61M, BW=1276MiB/s, IOS/call=32/32 ^CExiting on signal Maximum IOPS=2.62M With fixedbufs ************** # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -O0 -n1 -u1 /dev/ng0n1 submitter=0, tid=3627, file=/dev/ng0n1, node=-1 polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=3.17M, BW=1546MiB/s, IOS/call=32/31 IOPS=3.17M, BW=1546MiB/s, IOS/call=32/31 IOPS=3.17M, BW=1546MiB/s, IOS/call=32/32 IOPS=3.16M, BW=1544MiB/s, IOS/call=32/32 ^CExiting on signal Maximum IOPS=3.17M Changes since v11: Patch 2 - Add a check for flags (Jens) Patch 3 - Moved the refactoring patches to start, before the nvme-refactoring patches (Christoph) Patch 3 - Initialize ret to 0, to prevent uninitialized variable warning (kernel test robot) Patch 4 - Added the onstack advantage part in the commit description (Christoph) Patch 7 - Move blk_rq_free_request into nvme_map_user_request to handle error scenarios, instead of doing it using goto in it's callers, helps in getting rid of a uninitialized variable warning (kernel test robot) Patch 10 - Folded it in with the next patch to avoid compiler warning for unused static functions(Christoph) Changes since v10: - Patch 3: Fix overly long line (Christoph) - Patch 4: create a helper in block-map for vectored and non-vectored-io, to be used by scsi and nvme (Christoph) - Patch 5: Rename bio_map_get to blk_rq_map_bio_alloc and bio_map_put to blk_mq_map_bio_put (Christoph) - Patch 6: Split it into a prep patch and avoid duplicate checks (Christoph) - Patch 7: Put changes to pass ubuffer as a integer in a separate prep patch and simplify condition checks in nvme (Christoph) Changes since v9: - Patch 6: Make blk_rq_map_user_iov() to operate on bvec iterator (Christoph) - Patch 7: Change nvme to use the above Changes since v8: - Split some patches further; now 7 patches rather than 5 (Christoph) - Applied a bunch of other suggested cleanups (Christoph) Changes since v7: - Patch 3: added many cleanups/refactoring suggested by Christoph - Patch 4: added copying-pages fallback for bounce-buffer/dma-alignment case (Christoph) Changes since v6: - Patch 1: fix warning for io_uring_cmd_import_fixed (robot) - Changes since v5: - Patch 4: newly addd, to split a nvme function into two - Patch 3: folded cleanups in bio_map_user_iov (Chaitanya, Pankaj) - Rebase to latest for-next Changes since v4: - Patch 1, 2: folded all review comments of Jens Changes since v3: - uring_cmd_flags, change from u16 to u32 (Jens) - patch 3, add another helper to reduce code-duplication (Jens) Changes since v2: - Kill the new opcode, add a flag instead (Pavel) - Fix standalone build issue with patch 1 (Pavel) Changes since v1: - Fix a naming issue for an exported helper Anuj Gupta (6): io_uring: add io_uring_cmd_import_fixed io_uring: introduce fixed buffer support for io_uring_cmd block: add blk_rq_map_user_io scsi: Use blk_rq_map_user_io helper nvme: Use blk_rq_map_user_io helper block: rename bio_map_put to blk_mq_map_bio_put Kanchan Joshi (6): nvme: refactor nvme_add_user_metadata nvme: refactor nvme_alloc_request block: factor out blk_rq_map_bio_alloc helper block: extend functionality to map bvec iterator nvme: pass ubuffer as an integer nvme: wire up fixed buffer support for nvme passthrough block/blk-map.c | 150 ++++++++++++++++++++++++++++++---- drivers/nvme/host/ioctl.c | 144 ++++++++++++++++++-------------- drivers/scsi/scsi_ioctl.c | 22 +---- drivers/scsi/sg.c | 22 +---- include/linux/blk-mq.h | 2 + include/linux/io_uring.h | 10 ++- include/uapi/linux/io_uring.h | 9 ++ io_uring/uring_cmd.c | 28 ++++++- 8 files changed, 266 insertions(+), 121 deletions(-) -- 2.25.1