FWIW - the design of NVMe Simply Copy specifically included versioning of the data structure that describes what to copy. The reason for that was random peoples desire to complexify the Simple Copy command. Specifically, there was room designed into the data structure to accommodate a source NSID (to allow cross namespace copy - the intention being namespaces attached to the same controller); and room to accommodate the KPIO key tag value for each source range. Other people thought they could use this data structure versioning to design a fully SCSI XCOPY compatible data structure. My point, is just to consider the flexibility and extensibility of the OS interfaces when thinking about "Simple Copy". I'm just not sure how SIMPLY it will remain. Fred -----Original Message----- From: joshi.k@xxxxxxxxxxx <joshi.k@xxxxxxxxxxx> Sent: Thursday, February 13, 2020 12:11 AM To: 'Chaitanya Kulkarni' <Chaitanya.Kulkarni@xxxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; linux-nvme@xxxxxxxxxxxxxxxxxxx; dm-devel@xxxxxxxxxx; lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx Cc: axboe@xxxxxxxxx; msnitzer@xxxxxxxxxx; bvanassche@xxxxxxx; 'Martin K. Petersen' <martin.petersen@xxxxxxxxxx>; 'Matias Bjorling' <Matias.Bjorling@xxxxxxx>; 'Stephen Bates' <sbates@xxxxxxxxxxxx>; roland@xxxxxxxxxxxxxxx; joshi.k@xxxxxxxxxxx; mpatocka@xxxxxxxxxx; hare@xxxxxxx; 'Keith Busch' <kbusch@xxxxxxxxxx>; rwheeler@xxxxxxxxxx; 'Christoph Hellwig' <hch@xxxxxx>; Knight, Frederick <Frederick.Knight@xxxxxxxxxx>; zach.brown@xxxxxx; joshi.k@xxxxxxxxxxx; javier@xxxxxxxxxxx Subject: RE: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. I am very keen on this topic. I've been doing some work for "NVMe simple copy", and would like to discuss and solicit opinion of community on the following: - Simple-copy, unlike XCOPY and P2P, is limited to copy within a single namespace. Some of the problems that original XCOPY work [2] faced may not be applicable for simple-copy, e.g. split of single copy due to differing device-specific limits. Hope I'm not missing something in thinking so? - [Block I/O] Async interface (through io-uring or AIO) so that multiple copy operations can be queued. - [File I/O to user-space] I think it may make sense to extend copy_file_range API to do in-device copy as well. - [F2FS] GC of F2FS may leverage the interface. Currently it uses page-cache, which is fair. But, for relatively cold/warm data (if that needs to be garbage-collected anyway), it can rather bypass the Host and skip running into a scenario when something (useful) gets thrown out of cache. - [ZNS] ZNS users (kernel or user-space) would be log-structured, and will benefit from internal copy. But failure scenarios (partial copy, write-pointer position) need to be discussed. Thanks, Kanchan > -----Original Message----- > From: linux-nvme [mailto:linux-nvme-bounces@xxxxxxxxxxxxxxxxxxx] On > Behalf Of Chaitanya Kulkarni > Sent: Tuesday, January 7, 2020 11:44 PM > To: linux-block@xxxxxxxxxxxxxxx; linux-scsi@xxxxxxxxxxxxxxx; linux- > nvme@xxxxxxxxxxxxxxxxxxx; dm-devel@xxxxxxxxxx; lsf-pc@lists.linux- > foundation.org > Cc: axboe@xxxxxxxxx; msnitzer@xxxxxxxxxx; bvanassche@xxxxxxx; Martin K. > Petersen <martin.petersen@xxxxxxxxxx>; Matias Bjorling > <Matias.Bjorling@xxxxxxx>; Stephen Bates <sbates@xxxxxxxxxxxx>; > roland@xxxxxxxxxxxxxxx; mpatocka@xxxxxxxxxx; hare@xxxxxxx; Keith Busch > <kbusch@xxxxxxxxxx>; rwheeler@xxxxxxxxxx; Christoph Hellwig > <hch@xxxxxx>; frederick.knight@xxxxxxxxxx; zach.brown@xxxxxx > Subject: [LSF/MM/BFP ATTEND] [LSF/MM/BFP TOPIC] Storage: Copy Offload > > Hi all, > > * Background :- > ---------------------------------------------------------------------- > - > > Copy offload is a feature that allows file-systems or storage devices > to be > instructed to copy files/logical blocks without requiring involvement > of the local > CPU. > > With reference to the RISC-V summit keynote [1] single threaded performance is > limiting due to Denard scaling and multi-threaded performance is > slowing down > due Moore's law limitations. With the rise of SNIA Computation > Technical Storage Working Group (TWG) [2], offloading computations to > the device or over the fabrics is becoming popular as there are > several solutions available [2]. > One of the common operation which is popular in the kernel and is not merged > yet is Copy offload over the fabrics or on to the device. > > * Problem :- > ---------------------------------------------------------------------- > - > > The original work which is done by Martin is present here [3]. The > latest work > which is posted by Mikulas [4] is not merged yet. These two approaches > are totally different from each other. Several storage vendors > discourage mixing > copy offload requests with regular READ/WRITE I/O. Also, the fact that > the operation fails if a copy request ever needs to be split as it > traverses the stack it > has the unfortunate side-effect of preventing copy offload from > working in pretty much every common deployment configuration out there. > > * Current state of the work :- > ---------------------------------------------------------------------- > - > > With [3] being hard to handle arbitrary DM/MD stacking without > splitting the > command in two, one for copying IN and one for copying OUT. Which is > then demonstrated by the [4] why [3] it is not a suitable candidate. > Also, with [4] > there is an unresolved problem with the two-command approach about how > to handle changes to the DM layout between an IN and OUT operations. > > * Why Linux Kernel Storage System needs Copy Offload support now ? > ---------------------------------------------------------------------- > - > > With the rise of the SNIA Computational Storage TWG and solutions [2], existing > SCSI XCopy support in the protocol, recent advancement in the Linux > Kernel File > System for Zoned devices (Zonefs [5]), Peer to Peer DMA support in the Linux > Kernel mainly for NVMe devices [7] and eventually NVMe Devices and subsystem > (NVMe PCIe/NVMeOF) will benefit from Copy offload operation. > > With this background we have significant number of use-cases which are strong > candidates waiting for outstanding Linux Kernel Block Layer Copy > Offload support, so that Linux Kernel Storage subsystem can to address > previously mentioned problems [1] and allow efficient offloading of > the data related operations. (Such as move/copy etc.) > > For reference following is the list of the use-cases/candidates > waiting for Copy > Offload support :- > > 1. SCSI-attached storage arrays. > 2. Stacking drivers supporting XCopy DM/MD. > 3. Computational Storage solutions. > 7. File systems :- Local, NFS and Zonefs. > 4. Block devices :- Distributed, local, and Zoned devices. > 5. Peer to Peer DMA support solutions. > 6. Potentially NVMe subsystem both NVMe PCIe and NVMeOF. > > * What we will discuss in the proposed session ? > ---------------------------------------------------------------------- > - > > I'd like to propose a session to go over this topic to understand :- > > 1. What are the blockers for Copy Offload implementation ? > 2. Discussion about having a file system interface. > 3. Discussion about having right system call for user-space. > 4. What is the right way to move this work forward ? > 5. How can we help to contribute and move this work forward ? > > * Required Participants :- > ---------------------------------------------------------------------- > - > > I'd like to invite block layer, device drivers and file system > developers to:- > > 1. Share their opinion on the topic. > 2. Share their experience and any other issues with [4]. > 3. Uncover additional details that are missing from this proposal. > > Required attendees :- > > Martin K. Petersen > Jens Axboe > Christoph Hellwig > Bart Van Assche > Stephen Bates > Zach Brown > Roland Dreier > Ric Wheeler > Trond Myklebust > Mike Snitzer > Keith Busch > Sagi Grimberg > Hannes Reinecke > Frederick Knight > Mikulas Patocka > Matias Bjørling > > [1]https://protect2.fireeye.com/url?k=22656b2d-7fb63293-2264e062- > 0cc47a31ba82-2308b42828f59271&u=https://content.riscv.org/wp- > content/uploads/2018/12/A-New-Golden-Age-for-Computer-Architecture- > History-Challenges-and-Opportunities-David-Patterson-.pdf > [2] https://protect2.fireeye.com/url?k=44e3336c-19306ad2-44e2b823- > 0cc47a31ba82-70c015d1b0aaeb3f&u=https://www.snia.org/computational > https://protect2.fireeye.com/url?k=a366c2dc-feb59b62-a3674993- > 0cc47a31ba82- > 20bc672ec82b62b3&u=https://www.napatech.com/support/resources/solution > -descriptions/napatech-smartnic-solution-for-hardware-offload/ > https://protect2.fireeye.com/url?k=90febdca-cd2de474-90ff3685- > 0cc47a31ba82- > 277b6b09d36e6567&u=https://www.eideticom.com/products.html > https://protect2.fireeye.com/url?k=4195e835-1c46b18b-4194637a- > 0cc47a31ba82- > a11a4c2e4f0d8a58&u=https://www.xilinx.com/applications/data- > center/computational-storage.html > [3] git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git xcopy > [4] > https://protect2.fireeye.com/url?k=455ff23c-188cab82-455e7973- > 0cc47a31ba82-e8e6695611f4cc1f&u=https://www.spinics.net/lists/linux- > block/msg00599.html > [5] https://lwn.net/Articles/793585/ > [6] https://protect2.fireeye.com/url?k=08eb17f6-55384e48-08ea9cb9- > 0cc47a31ba82-1b80cd012aa4f6a3&u=https://nvmexpress.org/new-nvmetm- > specification-defines-zoned- > namespaces-zns-as-go-to-industry-technology/ > [7] https://protect2.fireeye.com/url?k=54b372ee-09602b50-54b2f9a1- > 0cc47a31ba82-ea67c60915bfd63b&u=https://github.com/sbates130272/linux- > p2pmem > [8] https://protect2.fireeye.com/url?k=30c2303c-6d116982-30c3bb73- > 0cc47a31ba82-95f0ddc1afe635fe&u=https://kernel.dk/io_uring.pdf > > Regards, > Chaitanya > > _______________________________________________ > linux-nvme mailing list > linux-nvme@xxxxxxxxxxxxxxxxxxx > https://protect2.fireeye.com/url?k=d145dc5a-8c9685e4-d1445715- > 0cc47a31ba82- > 3bf90c648f67ccdd&u=http://lists.infradead.org/mailman/listinfo/linux-n > vme -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel