The patch series covers the points discussed in past and most recently in LSFMM'23[0]. We have covered the initial agreed requirements in this patch set and further additional features suggested by community. This is next iteration of our previous patch set v13[1]. We achieve copy offload by sending 2 bio's with source and destination info and merge them to form a request. This request is sent to driver. So this design works only for request based storage drivers. Overall series supports: ======================== 1. Driver - NVMe Copy command (single NS, TP 4065), including support in nvme-target (for block and file back end). 2. Block layer - Block-generic copy (REQ_OP_COPY_DST/SRC), operation with interface accommodating two block-devs - Merging copy requests in request layer - Emulation, for in-kernel user when offload is natively absent - dm-linear support (for cases not requiring split) 3. User-interface - copy_file_range Testing ======= Copy offload can be tested on: a. QEMU: NVME simple copy (TP 4065). By setting nvme-ns parameters mssrl,mcl, msrc. For more info [2]. b. Null block device c. NVMe Fabrics loopback. d. blktests[3] Emulation can be tested on any device. fio[4]. Infra and plumbing: =================== We populate copy_file_range callback in def_blk_fops. For devices that support copy-offload, use blkdev_copy_offload to achieve in-device copy. However for cases, where device doesn't support offload, fallback to generic_copy_file_range. For in-kernel users (like NVMe fabrics), use blkdev_copy_offload if device is copy offload capable or else fallback to emulation using blkdev_copy_emulation. Modify checks in generic_copy_file_range to support block-device. Blktests[3] ====================== tests/block/035-040: Runs copy offload and emulation on null block device. tests/block/050,055: Runs copy offload and emulation on test nvme block device. tests/nvme/056-067: Create a loop backed fabrics device and run copy offload and emulation. Future Work =========== - loopback device copy offload support - upstream fio to use copy offload - upstream blktest to test copy offload - update man pages for copy_file_range - expand in-kernel users of copy offload These are to be taken up after this minimal series is agreed upon. Additional links: ================= [0] https://lore.kernel.org/linux-nvme/CA+1E3rJ7BZ7LjQXXTdX+-0Edz=zT14mmPGMiVCzUgB33C60tbQ@xxxxxxxxxxxxxx/ https://lore.kernel.org/linux-nvme/f0e19ae4-b37a-e9a3-2be7-a5afb334a5c3@xxxxxxxxxx/ https://lore.kernel.org/linux-nvme/20230113094648.15614-1-nj.shetty@xxxxxxxxxxx/ [1] https://lore.kernel.org/linux-nvme/20230627183629.26571-1-nj.shetty@xxxxxxxxxxx/ [2] https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#simple-copy [3] https://github.com/nitesh-shetty/blktests/tree/feat/copy_offload/v14 [4] https://github.com/OpenMPDK/fio/tree/copyoffload-3.35-v14 Changes since v13: ================= - block: 1. Simplified copy offload and emulation helpers, now caller needs to decide between offload/emulation fallback 2. src,dst bio order change (Christoph Hellwig) 3. refcount changes similar to dio (Christoph Hellwig) 4. Single outstanding IO for copy emulation (Christoph Hellwig) 5. use copy_max_sectors to identify copy offload capability and other reviews (Damien, Christoph) 6. Return status in endio handler (Christoph Hellwig) - nvme-fabrics: fallback to emulation in case of partial offload completion - in kernel user addition (Ming lei) - indentation, documentation, minor fixes, misc changes (Damien, Christoph) - blktests changes to test kernel changes Changes since v12: ================= - block,nvme: Replaced token based approach with request based single namespace capable approach (Christoph Hellwig) Changes since v11: ================= - Documentation: Improved documentation (Damien Le Moal) - block,nvme: ssize_t return values (Darrick J. Wong) - block: token is allocated to SECTOR_SIZE (Matthew Wilcox) - block: mem leak fix (Maurizio Lombardi) Changes since v10: ================= - NVMeOF: optimization in NVMe fabrics (Chaitanya Kulkarni) - NVMeOF: sparse warnings (kernel test robot) Changes since v9: ================= - null_blk, improved documentation, minor fixes(Chaitanya Kulkarni) - fio, expanded testing and minor fixes (Vincent Fu) Changes since v8: ================= - null_blk, copy_max_bytes_hw is made config fs parameter (Damien Le Moal) - Negative error handling in copy_file_range (Christian Brauner) - minor fixes, better documentation (Damien Le Moal) - fio upgraded to 3.34 (Vincent Fu) Changes since v7: ================= - null block copy offload support for testing (Damien Le Moal) - adding direct flag check for copy offload to block device, as we are using generic_copy_file_range for cached cases. - Minor fixes Changes since v6: ================= - copy_file_range instead of ioctl for direct block device - Remove support for multi range (vectored) copy - Remove ioctl interface for copy. - Remove offload support in dm kcopyd. Changes since v5: ================= - Addition of blktests (Chaitanya Kulkarni) - Minor fix for fabrics file backed path - Remove buggy zonefs copy file range implementation. Changes since v4: ================= - make the offload and emulation design asynchronous (Hannes Reinecke) - fabrics loopback support - sysfs naming improvements (Damien Le Moal) - use kfree() instead of kvfree() in cio_await_completion (Damien Le Moal) - use ranges instead of rlist to represent range_entry (Damien Le Moal) - change argument ordering in blk_copy_offload suggested (Damien Le Moal) - removed multiple copy limit and merged into only one limit (Damien Le Moal) - wrap overly long lines (Damien Le Moal) - other naming improvements and cleanups (Damien Le Moal) - correctly format the code example in description (Damien Le Moal) - mark blk_copy_offload as static (kernel test robot) Changes since v3: ================= - added copy_file_range support for zonefs - added documentation about new sysfs entries - incorporated review comments on v3 - minor fixes Changes since v2: ================= - fixed possible race condition reported by Damien Le Moal - new sysfs controls as suggested by Damien Le Moal - fixed possible memory leak reported by Dan Carpenter, lkp - minor fixes Changes since v1: ================= - sysfs documentation (Greg KH) - 2 bios for copy operation (Bart Van Assche, Mikulas Patocka, Martin K. Petersen, Douglas Gilbert) - better payload design (Darrick J. Wong) Anuj Gupta (1): fs/read_write: Enable copy_file_range for block device. Nitesh Shetty (10): block: Introduce queue limits and sysfs for copy-offload support Add infrastructure for copy offload in block and request layer. block: add copy offload support block: add emulation for copy fs, block: copy_file_range for def_blk_ops for direct block device nvme: add copy offload support nvmet: add copy command support for bdev and file ns dm: Add support for copy offload dm: Enable copy offload for dm-linear target null_blk: add support for copy offload Documentation/ABI/stable/sysfs-block | 23 ++ Documentation/block/null_blk.rst | 5 + block/blk-core.c | 7 + block/blk-lib.c | 419 +++++++++++++++++++++++++++ block/blk-merge.c | 41 +++ block/blk-settings.c | 24 ++ block/blk-sysfs.c | 36 +++ block/blk.h | 16 + block/elevator.h | 1 + block/fops.c | 25 ++ drivers/block/null_blk/main.c | 99 ++++++- drivers/block/null_blk/null_blk.h | 1 + drivers/block/null_blk/trace.h | 23 ++ drivers/md/dm-linear.c | 1 + drivers/md/dm-table.c | 37 +++ drivers/md/dm.c | 7 + drivers/nvme/host/constants.c | 1 + drivers/nvme/host/core.c | 79 +++++ drivers/nvme/host/trace.c | 19 ++ drivers/nvme/target/admin-cmd.c | 9 +- drivers/nvme/target/io-cmd-bdev.c | 97 +++++++ drivers/nvme/target/io-cmd-file.c | 50 ++++ drivers/nvme/target/nvmet.h | 4 + drivers/nvme/target/trace.c | 19 ++ fs/read_write.c | 8 +- include/linux/bio.h | 6 +- include/linux/blk_types.h | 10 + include/linux/blkdev.h | 22 ++ include/linux/device-mapper.h | 3 + include/linux/nvme.h | 43 ++- 30 files changed, 1119 insertions(+), 16 deletions(-) base-commit: f7dc24b3413851109c4047b22997bd0d95ed52a2 -- 2.35.1.500.gb896f729e2