On Sun, Jan 28, 2024 at 11:25 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote: > > On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@xxxxxxxxxx> wrote: > > > > > > Hello, > > > > > > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm > > > not seeing any references to either on the mailinglist. I've got a > > > passthrough filesystem and with more users taking advantage of btrfs and > > > xfs w/ reflinks there has been some demand for the ability to support it. > > > > > > > [CC fsdevel because my answer's scope is wider than just FUSE] > > > > FWIW, the kernel implementation of copy_file_range() calls remap_file_range() > > (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the > > application they are using, calling copy_file_range() will propagate via your > > fuse filesystem correctly to underlying xfs/btrfs and will effectively result in > > clone_file_range(). > > > > Thus using tools like cp --reflink, on your passthrough filesystem should yield > > the expected result. Sorry, cp --reflink indeed uses clone > > > > For a more practical example see: > > https://bugzilla.samba.org/show_bug.cgi?id=12033 > > Since Samba 4.1, server-side-copy is implemented as copy_file_range() > > > > API-wise, there are two main differences between copy_file_range() and > > FICLONERANGE: > > 1. copy_file_range() can result in partial copy > > 2. copy_file_range() can results in more used disk space > > > > Other API differences are minor, but the fact that copy_file_range() > > is a syscall with a @flags argument makes it a candidate for being > > a super-set of both functionalities. > > > > The question is, for your users, are you actually looking for > > clone_file_range() support? or is best-effort copy_file_range() with > > clone_file_range() fallback enough? > > > > If your users are looking for the atomic clone_file_range() behavior, > > then a single flag in fuse_copy_file_range_in::flags is enough to > > indicate to the server that the "atomic clone" behavior is wanted. > > > > Note that the @flags argument to copy_file_range() syscall does not > > support any flags at all at the moment. > > > > The only flag defined in the kernel COPY_FILE_SPLICE is for > > internal use only. > > > > We can define a flag COPY_FILE_CLONE to use either only > > internally in kernel and in FUSE protocol or even also in > > copy_file_range() syscall. > > I don't care how fuse implements ->remap_file_range(), but no change > to syscall behaviour, please. > ok. > copy_file_range() is supposed to select the best available method > for copying the data based on kernel side technology awareness that > the application knows nothing about (e.g. clone, server-side copy, > block device copy offload, etc). The API is technology agnostic and > largely future proof because of this; adding flags to say "use this > specific technology to copy data or fail" is the exact opposite of > how we want copy_file_range() to work. > > i.e. if you want a specific type of "copy" to be done (i.e. clone > rather than data copy) then call FICLONE or copy the data yourself > to do exactly what you need. If you just want it done fast as > possible and don't care about implementation (99% of cases), then > just call copy_file_range(). > Technically, a flag COPY_FILE_ATOMIC would be a requirement not an implementation detail, but this requirement could currently be fulfilled only by fs that implement remap_file_range(), but nevermind, I won't be trying to push a syscall API change myself. > > Sure, we can also add a new FUSE protocol command for > > FUSE_CLONE_FILE_RANGE, but I don't think that is > > necessary. > > It is certainly not necessary if there is agreement to extend the > > copy_file_range() syscall to support COPY_FILE_CLONE flag. > > We have already have FICLONE/FICLONERANGE for this operation. Fuse > just needs to implement ->remap_file_range() server stubs, and then > the back end driver can choose to implement it if it's storage > mechanisms support such functionality. For Antonio's request to support FICLONERANGE with FUSE, that would be enough using a new protocol command. > Then it will get used > automatically for copy_file_range() for those FUSE drivers, the rest > will just copy the data in the kernel using splice as they currently > do... This is not the current behavior of FUSE as far as I can tell. The reason is that vfs_copy_file_range() checks if fs implement ->copy_file_range(), if it does, it will not fallback to ->remap_file_range() nor to splice. This is intentional - fs with ->copy_file_range() has full control including the decision to return whatever error code to userspace. The problem is that the FUSE kernel driver always implements ->copy_file_range(), regardless whether the FUSE server implements FUSE_COPY_FILE_RANGE. So for a FUSE server that does not implement FUSE_COPY_FILE_RANGE, fc->no_copy_file_range is true and copy_file_range() returns -EOPNOTSUPP. So either the fallback from FUSE_COPY_FILE_RANGE to FUSE_CLONE_FILE_RANGE will be done internally by FUSE, or clone/copy support will need to be advertised during FUSE_INIT and a different set of fuse_file_operations will need to be used accordingly, which seems overly complicated. Thanks, Amir.