The patch series covers the points discussed in November 2021 virtual call [LSF/MM/BFP TOPIC] Storage: Copy Offload [0]. We have covered the initial agreed requirements in this patchset and further additional features suggested by community. Patchset borrows Mikulas's token based approach for 2 bdev implementation. This is on top of our previous patchset v5[1]. Overall series supports: ======================== 1. Driver - NVMe Copy command (single NS, TP 4065), including support in nvme-target (for block and file backend). 2. Block layer - Block-generic copy (REQ_COPY flag), with interface accommodating two block-devs, and multi-source/destination interface - Emulation, when offload is natively absent - dm-linear support (for cases not requiring split) 3. User-interface - new ioctl 4. In-kernel user - dm-kcopyd Testing ======= Copy offload can be tested on: a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns parameters mssrl,mcl, msrc. For more info [2]. b. Fabrics loopback. c. blktests[3] (tests block/032,033, nvme/046,047,048,049) Emuation can be tested on any device. Sample application to use IOCTL is present in patch desciption. fio[4]. Performance =========== With the async design of copy-emulation/offload using fio[4], we were able to see the following improvements as compared to userspace read and write on a NVMeOF TCP setup: Setup1: Network Speed: 1000Mb/s Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Target PC: AMD Ryzen 9 5900X 12-Core Processor block size 8k, range 1: 635% improvement in IO BW (107 MiB/s to 787 MiB/s). Network utilisation drops from 97% to 14%. block-size 2M, range 16: 2555% improvement in IO BW (100 MiB/s to 2655 MiB/s). Network utilisation drops from 89% to 0.62%. Setup2: Network Speed: 100Gb/s Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores (host and target have the same configuration) block-size 8k, range 1: 6.5% improvement in IO BW (791 MiB/s to 843 MiB/s). Network utilisation drops from 6.75% to 0.14%. block-size 2M, range 16: 15% improvement in IO BW (1027 MiB/s to 1183 MiB/s). Network utilisation drops from 8.42% to ~0%. block-size 8k, 8 ranges: 18% drop in IO BW (from 798 MiB/s to 647 MiB/s) Network utilisation drops from 6.66% to 0.13%. At present we see drop in performance for bs 8k,16k and higher ranges (8, 16), so something more to check there. Overall, in these tests, kernel copy emulation performs better than userspace read+write. Blktests[3] ====================== tests/block/032,033: Runs copy offload and emulation on block device. tests/nvme/046,047,048,049 Create a loop backed fabrics device and run copy offload and emulation. Future Work =========== - nullblk: copy-offload emulation. - generic copy file range (CFR): We explored the possibility of using block device def_blk_ops, but we saw a major disadvantage for in-kernel users. fd is not available for in-kernel user [5]. - loopback device copy offload support - upstream fio to use copy offload These are to be taken up after we reach consensus on the plumbing of current elements that are part of this series. Additional links: ================= [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@xxxxxxxxxxxxxx/ [1] https://lore.kernel.org/lkml/20221130041450.GA17533@test-zns/T/ [2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy [3] https://github.com/nitesh-shetty/blktests/tree/feat/copy_offload/v6 [4] https://github.com/vincentkfu/fio/tree/copyoffload [5] https://lore.kernel.org/lkml/20221130041450.GA17533@test-zns/T/#m0e2754202fc2223e937c8e7ba3cf7336a93f97a3 Changes since v5: ================= - Addition of blktests (Chaitanya Kulkarni) - Minor fix for fabrics file backed path - Remove buggy zonefs copy file range implementation. Changes since v4: ================= - make the offload and emulation design asynchronous (Hannes Reinecke) - fabrics loopback support - sysfs naming improvements (Damien Le Moal) - use kfree() instead of kvfree() in cio_await_completion (Damien Le Moal) - use ranges instead of rlist to represent range_entry (Damien Le Moal) - change argument ordering in blk_copy_offload suggested (Damien Le Moal) - removed multiple copy limit and merged into only one limit (Damien Le Moal) - wrap overly long lines (Damien Le Moal) - other naming improvements and cleanups (Damien Le Moal) - correctly format the code example in description (Damien Le Moal) - mark blk_copy_offload as static (kernel test robot) Changes since v3: ================= - added copy_file_range support for zonefs - added documentation about new sysfs entries - incorporated review comments on v3 - minor fixes Changes since v2: ================= - fixed possible race condition reported by Damien Le Moal - new sysfs controls as suggested by Damien Le Moal - fixed possible memory leak reported by Dan Carpenter, lkp - minor fixes Nitesh Shetty (9): block: Introduce queue limits for copy-offload support block: Add copy offload support infrastructure block: add emulation for copy block: Introduce a new ioctl for copy nvme: add copy offload support nvmet: add copy command support for bdev and file ns dm: Add support for copy offload. dm: Enable copy offload for dm-linear target dm kcopyd: use copy offload support Documentation/ABI/stable/sysfs-block | 36 ++ block/blk-lib.c | 597 +++++++++++++++++++++++++++ block/blk-map.c | 4 +- block/blk-settings.c | 24 ++ block/blk-sysfs.c | 64 +++ block/blk.h | 2 + block/ioctl.c | 36 ++ drivers/md/dm-kcopyd.c | 56 ++- drivers/md/dm-linear.c | 1 + drivers/md/dm-table.c | 42 ++ drivers/md/dm.c | 7 + drivers/nvme/host/constants.c | 1 + drivers/nvme/host/core.c | 106 ++++- drivers/nvme/host/fc.c | 5 + drivers/nvme/host/nvme.h | 7 + drivers/nvme/host/pci.c | 27 +- drivers/nvme/host/rdma.c | 7 + drivers/nvme/host/tcp.c | 16 + drivers/nvme/host/trace.c | 19 + drivers/nvme/target/admin-cmd.c | 9 +- drivers/nvme/target/io-cmd-bdev.c | 79 ++++ drivers/nvme/target/io-cmd-file.c | 52 +++ drivers/nvme/target/loop.c | 6 + drivers/nvme/target/nvmet.h | 2 + include/linux/blk_types.h | 44 ++ include/linux/blkdev.h | 18 + include/linux/device-mapper.h | 5 + include/linux/nvme.h | 43 +- include/uapi/linux/fs.h | 27 ++ 29 files changed, 1324 insertions(+), 18 deletions(-) base-commit: 469a89fd3bb73bb2eea628da2b3e0f695f80b7ce -- 2.35.1.500.gb896f729e2