I am very keen on this topic. I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following: - Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits. Hope I'm not missing something in thinking so? - [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued. - [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well. - [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache. - [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed. Thanks, Kanchan > -----Original Message----- > From: linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf > Of Chaitanya Kulkarni > Sent: Tuesday, January 7, 2020 11:44 PM > To: linux-block@xxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; linux- > nvme@xxxxxxxxxxxxxxxxxxx; dm-devel@xxxxxxxxxx; lsf-pc@lists.linux- > foundation.org > Cc: axboe@xxxxxxxxx; msnitzer@xxxxxxxxxx; bvanassche@xxxxxxx; Martin K. > Petersen <martin.petersen@xxxxxxxxxx>; Matias Bjorling > <Matias.Bjorling@xxxxxxx>; Stephen Bates <sbates@xxxxxxxxxxxx>; > roland@xxxxxxxxxxxxxxx; mpatocka@xxxxxxxxxx; hare@xxxxxxx; Keith Busch > <kbusch@xxxxxxxxxx>; rwheeler@xxxxxxxxxx; Christoph Hellwig <hch@xxxxxx>; > frederick.knight@xxxxxxxxxx; zach.brown@xxxxxx > Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload > > Hi all, > > * Background :- > ----------------------------------------------------------------------- > > Copy offload is a feature that allows file-systems or storage devices to be > instructed to copy files/logical blocks without requiring involvement of the local > CPU. > > With reference to the RISC-V summit keynote [1] single threaded performance is > limiting due to Denard scaling and multi-threaded performance is slowing down > due Moore's law limitations. With the rise of SNIA Computation Technical > Storage Working Group (TWG) [2], offloading computations to the device or > over the fabrics is becoming popular as there are several solutions available [2]. > One of the common operation which is popular in the kernel and is not merged > yet is Copy offload over the fabrics or on to the device. > > * Problem :- > ----------------------------------------------------------------------- > > The original work which is done by Martin is present here [3]. The latest work > which is posted by Mikulas [4] is not merged yet. These two approaches are > totally different from each other. Several storage vendors discourage mixing > copy offload requests with regular READ/WRITE I/O. Also, the fact that the > operation fails if a copy request ever needs to be split as it traverses the stack it > has the unfortunate side-effect of preventing copy offload from working in > pretty much every common deployment configuration out there. > > * Current state of the work :- > ----------------------------------------------------------------------- > > With [3] being hard to handle arbitrary DM/MD stacking without splitting the > command in two, one for copying IN and one for copying OUT. Which is then > demonstrated by the [4] why [3] it is not a suitable candidate. Also, with [4] > there is an unresolved problem with the two-command approach about how to > handle changes to the DM layout between an IN and OUT operations. > > * Why Linux Kernel Storage System needs Copy Offload support now ? > ----------------------------------------------------------------------- > > With the rise of the SNIA Computational Storage TWG and solutions [2], existing > SCSI XCopy support in the protocol, recent advancement in the Linux Kernel File > System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux > Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem > (NVMe PCIe/NVMeOF) will benefit from Copy offload operation. > > With this background we have significant number of use-cases which are strong > candidates waiting for outstanding Linux Kernel Block Layer Copy Offload > support, so that Linux Kernel Storage subsystem can to address previously > mentioned problems [1] and allow efficient offloading of the data related > operations. (Such as move/copy etc.) > > For reference following is the list of the use-cases/candidates waiting for Copy > Offload support :- > > 1. SCSI-attached storage arrays. > 2. Stacking drivers supporting XCopy DM/MD. > 3. Computational Storage solutions. > 7. File systems :- Local, NFS and Zonefs. > 4. Block devices :- Distributed, local, and Zoned devices. > 5. Peer to Peer DMA support solutions. > 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF. > > * What we will discuss in the proposed session ? > ----------------------------------------------------------------------- > > I'd like to propose a session to go over this topic to understand :- > > 1. What are the blockers for Copy Offload implementation ? > 2. Discussion about having a file system interface. > 3. Discussion about having right system call for user-space. > 4. What is the right way to move this work forward ? > 5. How can we help to contribute and move this work forward ? > > * Required Participants :- > ----------------------------------------------------------------------- > > I'd like to invite block layer, device drivers and file system developers to:- > > 1. Share their opinion on the topic. > 2. Share their experience and any other issues with [4]. > 3. Uncover additional details that are missing from this proposal. > > Required attendees :- > > Martin K. Petersen > Jens Axboe > Christoph Hellwig > Bart Van Assche > Stephen Bates > Zach Brown > Roland Dreier > Ric Wheeler > Trond Myklebust > Mike Snitzer > Keith Busch > Sagi Grimberg > Hannes Reinecke > Frederick Knight > Mikulas Patocka > Matias Bjørling > > [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062- > 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp- > content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture- > History-Challenges-and-Opportunities-David-Patterson-.pdf > [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823- > 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational > https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993- > 0cc47a31ba82- > 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution > -descriptions/napatech-smartnic-solution-for-hardware-offload/ > https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685- > 0cc47a31ba82- > 277b6b09d36e6567&u=https://www.eideticom.com/products.html > https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a- > 0cc47a31ba82- > a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data- > center/computational-storage.html > [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy [4] > https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973- > 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux- > block/msg00599.html > [5] https://lwn.net/Articles/793585/ > [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9- > 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm- > specification-defines-zoned- > namespaces-zns-as-go-to-industry-technology/ > [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1- > 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux- > p2pmem > [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73- > 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf > > Regards, > Chaitanya > > _______________________________________________ > linux-nvme mailing list > linux-nvme@xxxxxxxxxxxxxxxxxxx > https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715- > 0cc47a31ba82- > 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-nvme -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel