Hi all, Background: =========== Copy offload is a feature that allows file-systems or storage devices to be instructed to copy files/logical blocks without requiring involvement of the local CPU. We are working on copy offload[1], mainly focused on NVMe Copy command (single NS, TP 4065) and NVMe over fabrics. Previously Chaitanya Kulkarni presented talk on copy offload[2]. Current state of work: ====================== Our latest patch series[1] covers 1. Driver - NVMe Copy command (single NS, TP 4065), including support in nvme-target (for block and file back-end). 2. Block layer - Block-generic copy (REQ_COPY flag), with interface accommodating two block-devs, and multi-source/destination interface - Emulation, when offload is natively absent - dm-linear support (for cases not requiring split) 3. User-interface - new ioctl 4. In-kernel user - dm-kcopyd 5. Tools: - fio - blktests Performance: With the async design of copy-emulation/offload using fio[3], we were able to see the following improvements as compared to userspace read and write on a NVMeOF TCP setup: Setup1: Network Speed: 1000Mb/s Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Target PC: AMD Ryzen 9 5900X 12-Core Processor block size 8k, range 1: 635% improvement in IO BW (107 MiB/s to 787 MiB/s). Network utilisation drops from 97% to 14%. block-size 2M, range 16: 2555% improvement in IO BW (100 MiB/s to 2655 MiB/s). Network utilisation drops from 89% to 0.62%. Setup2: Network Speed: 100Gb/s Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores (host and target have the same configuration) block-size 8k, range 1: 6.5% improvement in IO BW (791 MiB/s to 843 MiB/s). Network utilisation drops from 6.75% to 0.14%. block-size 2M, range 16: 15% improvement in IO BW (1027 MiB/s to 1183 MiB/s). Network utilisation drops from 8.42% to ~0%. Overall, in these tests, kernel copy emulation reduces network utilisation drastically and improves IO bandwidth. What we will discuss in the proposed session ? ============================================== Through this session I would like to discuss and get opinion of community on minimum set of requirement for copy offload for this phase. Also share some of the blockers we are facing and would get opinion on how we can proceed further. Required attendees: =================== Martin K. Petersen Jens Axboe Christoph Hellwig Mike Snitzer Hannes Reinecke Chaitanya Kulkarni Bart Van Assche Damien Le Moal Mikulas Patocka Keith Busch Sagi Grimberg Javier Gonzalez Kanchan Joshi Links: ====== [1] https://lore.kernel.org/lkml/20230112115908.23662-1-nj.shetty@xxxxxxxxxxx/T/#m91a2f506eaa214035a8596fa9aa8d2b9f46654cc [2] https://lore.kernel.org/all/BYAPR04MB49652C4B75E38F3716F3C06386539@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ [3] https://github.com/vincentkfu/fio/tree/copyoffload