On Mon, Jan 29, 2024 at 3:54 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Sun, Jan 28, 2024 at 11:25 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > > On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote: > > > On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@xxxxxxxxxx> wrote: > > > > > > > > Hello, > > > > > > > > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm > > > > not seeing any references to either on the mailinglist. I've got a > > > > passthrough filesystem and with more users taking advantage of btrfs and > > > > xfs w/ reflinks there has been some demand for the ability to support it. > > > > > > > > > > [CC fsdevel because my answer's scope is wider than just FUSE] > > > > > > FWIW, the kernel implementation of copy_file_range() calls remap_file_range() > > > (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the > > > application they are using, calling copy_file_range() will propagate via your > > > fuse filesystem correctly to underlying xfs/btrfs and will effectively result in > > > clone_file_range(). > > > > > > Thus using tools like cp --reflink, on your passthrough filesystem should yield > > > the expected result. > > Sorry, cp --reflink indeed uses clone > > > > > > > For a more practical example see: > > > https://bugzilla.samba.org/show_bug.cgi?id=12033 > > > Since Samba 4.1, server-side-copy is implemented as copy_file_range() > > > > > > API-wise, there are two main differences between copy_file_range() and > > > FICLONERANGE: > > > 1. copy_file_range() can result in partial copy > > > 2. copy_file_range() can results in more used disk space > > > > > > Other API differences are minor, but the fact that copy_file_range() > > > is a syscall with a @flags argument makes it a candidate for being > > > a super-set of both functionalities. > > > > > > The question is, for your users, are you actually looking for > > > clone_file_range() support? or is best-effort copy_file_range() with > > > clone_file_range() fallback enough? > > > > > > If your users are looking for the atomic clone_file_range() behavior, > > > then a single flag in fuse_copy_file_range_in::flags is enough to > > > indicate to the server that the "atomic clone" behavior is wanted. > > > > > > Note that the @flags argument to copy_file_range() syscall does not > > > support any flags at all at the moment. > > > > > > The only flag defined in the kernel COPY_FILE_SPLICE is for > > > internal use only. > > > > > > We can define a flag COPY_FILE_CLONE to use either only > > > internally in kernel and in FUSE protocol or even also in > > > copy_file_range() syscall. > > > > I don't care how fuse implements ->remap_file_range(), but no change > > to syscall behaviour, please. > > > > ok. > > > copy_file_range() is supposed to select the best available method > > for copying the data based on kernel side technology awareness that > > the application knows nothing about (e.g. clone, server-side copy, > > block device copy offload, etc). The API is technology agnostic and > > largely future proof because of this; adding flags to say "use this > > specific technology to copy data or fail" is the exact opposite of > > how we want copy_file_range() to work. > > > > i.e. if you want a specific type of "copy" to be done (i.e. clone > > rather than data copy) then call FICLONE or copy the data yourself > > to do exactly what you need. If you just want it done fast as > > possible and don't care about implementation (99% of cases), then > > just call copy_file_range(). > > > > Technically, a flag COPY_FILE_ATOMIC would be a requirement > not an implementation detail, but this requirement could currently be > fulfilled only by fs that implement remap_file_range(), but nevermind, > I won't be trying to push a syscall API change myself. > > > > Sure, we can also add a new FUSE protocol command for > > > FUSE_CLONE_FILE_RANGE, but I don't think that is > > > necessary. > > > It is certainly not necessary if there is agreement to extend the > > > copy_file_range() syscall to support COPY_FILE_CLONE flag. > > > > We have already have FICLONE/FICLONERANGE for this operation. Fuse > > just needs to implement ->remap_file_range() server stubs, and then > > the back end driver can choose to implement it if it's storage > > mechanisms support such functionality. > > For Antonio's request to support FICLONERANGE with FUSE, > that would be enough using a new protocol command. > > > Then it will get used > > automatically for copy_file_range() for those FUSE drivers, the rest > > will just copy the data in the kernel using splice as they currently > > do... > > This is not the current behavior of FUSE as far as I can tell. > The reason is that vfs_copy_file_range() checks if fs implement > ->copy_file_range(), if it does, it will not fallback to ->remap_file_range() > nor to splice. This is intentional - fs with ->copy_file_range() has full > control including the decision to return whatever error code to userspace. > > The problem is that the FUSE kernel driver always implements > ->copy_file_range(), regardless whether the FUSE server implements > FUSE_COPY_FILE_RANGE. So for a FUSE server that does not > implement FUSE_COPY_FILE_RANGE, fc->no_copy_file_range is > true and copy_file_range() returns -EOPNOTSUPP. > > So either the fallback from FUSE_COPY_FILE_RANGE to > FUSE_CLONE_FILE_RANGE will be done internally by FUSE, > or clone/copy support will need to be advertised during FUSE_INIT > and a different set of fuse_file_operations will need to be used > accordingly, which seems overly complicated. > Note that FUSE_COPY_FILE_RANGE uses struct fuse_write_out to report the number of bytes copied between files (uint32_t size), and therefore it can not copy more than 2^32-1 bytes at each call. For example, a call to cp --reflink of 1T file yields multiple calls to copy_file_range() by userspace. - Shachar. > Thanks, > Amir. >