Hi uring-cmd lacks the ability to leverage the pre-registered buffers. This series adds that support in uring-cmd, and plumbs nvme passthrough to work with it. Patch 3 and 4 contains a bunch of general nvme cleanups, which got added along the iterations. Patches 11, 12 and 13 carve out a block helper and scsi/nvme then use it to avoid duplication of code. Using registered-buffers showed IOPS hike from 1.65M to 2.04M. Without fixedbufs ***************** # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 submitter=0, tid=2481, file=/dev/ng0n1, node=-1 polled=1, fixedbufs=0/0, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=2.60M, BW=1271MiB/s, IOS/call=32/31 IOPS=2.60M, BW=1271MiB/s, IOS/call=32/32 IOPS=2.61M, BW=1272MiB/s, IOS/call=32/32 IOPS=2.59M, BW=1266MiB/s, IOS/call=32/32 ^CExiting on signal Maximum IOPS=2.61M With fixedbufs ************** # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -O0 -n1 -u1 /dev/ng0n1 submitter=0, tid=2487, file=/dev/ng0n1, node=-1 polled=1, fixedbufs=1/0, register_files=1, buffered=1, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=3.15M, BW=1540MiB/s, IOS/call=32/31 IOPS=3.15M, BW=1538MiB/s, IOS/call=32/32 IOPS=3.15M, BW=1536MiB/s, IOS/call=32/32 IOPS=3.15M, BW=1537MiB/s, IOS/call=32/32 ^CExiting on signal Maximum IOPS=3.15M Changes since v10: - Patch 3: Fix overly long line (Christoph) - Patch 4: create a helper in block-map for vectored and non-vectored-io, to be used by scsi and nvme (Christoph) - Patch 5: Rename bio_map_get to blk_rq_map_bio_alloc and bio_map_put to blk_mq_map_bio_put (Christoph) - Patch 6: Split it into a prep patch and avoid duplicate checks (Christoph) - Patch 7: Put changes to pass ubuffer as a integer in a separate prep patch and simplify condition checks in nvme (Christoph) Changes since v9: - Patch 6: Make blk_rq_map_user_iov() to operate on bvec iterator (Christoph) - Patch 7: Change nvme to use the above Changes since v8: - Split some patches further; now 7 patches rather than 5 (Christoph) - Applied a bunch of other suggested cleanups (Christoph) Changes since v7: - Patch 3: added many cleanups/refactoring suggested by Christoph - Patch 4: added copying-pages fallback for bounce-buffer/dma-alignment case (Christoph) Changes since v6: - Patch 1: fix warning for io_uring_cmd_import_fixed (robot) - Changes since v5: - Patch 4: newly addd, to split a nvme function into two - Patch 3: folded cleanups in bio_map_user_iov (Chaitanya, Pankaj) - Rebase to latest for-next Changes since v4: - Patch 1, 2: folded all review comments of Jens Changes since v3: - uring_cmd_flags, change from u16 to u32 (Jens) - patch 3, add another helper to reduce code-duplication (Jens) Changes since v2: - Kill the new opcode, add a flag instead (Pavel) - Fix standalone build issue with patch 1 (Pavel) Changes since v1: - Fix a naming issue for an exported helper Anuj Gupta (6): io_uring: add io_uring_cmd_import_fixed io_uring: introduce fixed buffer support for io_uring_cmd block: rename bio_map_put to blk_mq_map_bio_put block: add blk_rq_map_user_io scsi: Use blk_rq_map_user_io helper nvme: Use blk_rq_map_user_io helper Kanchan Joshi (7): nvme: refactor nvme_add_user_metadata nvme: refactor nvme_alloc_request block: factor out blk_rq_map_bio_alloc helper block: add blk_rq_map_user_bvec block: extend functionality to map bvec iterator nvme: pass ubuffer as an integer nvme: wire up fixed buffer support for nvme passthrough block/blk-map.c | 150 ++++++++++++++++++++++++++++++---- drivers/nvme/host/ioctl.c | 149 +++++++++++++++++++-------------- drivers/scsi/scsi_ioctl.c | 22 +---- drivers/scsi/sg.c | 22 +---- include/linux/blk-mq.h | 2 + include/linux/io_uring.h | 10 ++- include/uapi/linux/io_uring.h | 9 ++ io_uring/uring_cmd.c | 26 +++++- 8 files changed, 268 insertions(+), 122 deletions(-) -- 2.25.1