Re: [fuse-devel] FICLONE / FICLONERANGE support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 29, 2024 at 3:54 PM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> On Sun, Jan 28, 2024 at 11:25 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> > On Sun, Jan 28, 2024 at 12:07:22PM +0200, Amir Goldstein wrote:
> > > On Sun, Jan 28, 2024 at 2:31 AM Antonio SJ Musumeci <trapexit@xxxxxxxxxx> wrote:
> > > >
> > > > Hello,
> > > >
> > > > Has anyone investigated adding support for FICLONE and FICLONERANGE? I'm
> > > > not seeing any references to either on the mailinglist. I've got a
> > > > passthrough filesystem and with more users taking advantage of btrfs and
> > > > xfs w/ reflinks there has been some demand for the ability to support it.
> > > >
> > >
> > > [CC fsdevel because my answer's scope is wider than just FUSE]
> > >
> > > FWIW, the kernel implementation of copy_file_range() calls remap_file_range()
> > > (a.k.a. clone_file_range()) for both xfs and btrfs, so if your users control the
> > > application they are using, calling copy_file_range() will propagate via your
> > > fuse filesystem correctly to underlying xfs/btrfs and will effectively result in
> > > clone_file_range().
> > >
> > > Thus using tools like cp --reflink, on your passthrough filesystem should yield
> > > the expected result.
>
> Sorry, cp --reflink indeed uses clone
>
> > >
> > > For a more practical example see:
> > > https://bugzilla.samba.org/show_bug.cgi?id=12033
> > > Since Samba 4.1, server-side-copy is implemented as copy_file_range()
> > >
> > > API-wise, there are two main differences between copy_file_range() and
> > > FICLONERANGE:
> > > 1. copy_file_range() can result in partial copy
> > > 2. copy_file_range() can results in more used disk space
> > >
> > > Other API differences are minor, but the fact that copy_file_range()
> > > is a syscall with a @flags argument makes it a candidate for being
> > > a super-set of both functionalities.
> > >
> > > The question is, for your users, are you actually looking for
> > > clone_file_range() support? or is best-effort copy_file_range() with
> > > clone_file_range() fallback enough?
> > >
> > > If your users are looking for the atomic clone_file_range() behavior,
> > > then a single flag in fuse_copy_file_range_in::flags is enough to
> > > indicate to the server that the "atomic clone" behavior is wanted.
> > >
> > > Note that the @flags argument to copy_file_range() syscall does not
> > > support any flags at all at the moment.
> > >
> > > The only flag defined in the kernel COPY_FILE_SPLICE is for
> > > internal use only.
> > >
> > > We can define a flag COPY_FILE_CLONE to use either only
> > > internally in kernel and in FUSE protocol or even also in
> > > copy_file_range() syscall.
> >
> > I don't care how fuse implements ->remap_file_range(), but no change
> > to syscall behaviour, please.
> >
>
> ok.
>
> > copy_file_range() is supposed to select the best available method
> > for copying the data based on kernel side technology awareness that
> > the application knows nothing about (e.g. clone, server-side copy,
> > block device copy offload, etc). The API is technology agnostic and
> > largely future proof because of this; adding flags to say "use this
> > specific technology to copy data or fail" is the exact opposite of
> > how we want copy_file_range() to work.
> >
> > i.e. if you want a specific type of "copy" to be done (i.e. clone
> > rather than data copy) then call FICLONE or copy the data yourself
> > to do exactly what you need. If you just want it done fast as
> > possible and don't care about implementation (99% of cases), then
> > just call copy_file_range().
> >
>
> Technically, a flag COPY_FILE_ATOMIC would be a requirement
> not an implementation detail, but this requirement could currently be
> fulfilled only by fs that implement remap_file_range(), but nevermind,
> I won't be trying to push a syscall API change myself.
>
> > > Sure, we can also add a new FUSE protocol command for
> > > FUSE_CLONE_FILE_RANGE, but I don't think that is
> > > necessary.
> > > It is certainly not necessary if there is agreement to extend the
> > > copy_file_range() syscall to support COPY_FILE_CLONE flag.
> >
> > We have already have FICLONE/FICLONERANGE for this operation. Fuse
> > just needs to implement ->remap_file_range() server stubs, and then
> > the back end driver  can choose to implement it if it's storage
> > mechanisms support such functionality.
>
> For Antonio's request to support FICLONERANGE with FUSE,
> that would be enough using a new protocol command.
>
> > Then it will get used
> > automatically for copy_file_range() for those FUSE drivers, the rest
> > will just copy the data in the kernel using splice as they currently
> > do...
>
> This is not the current behavior of FUSE as far as I can tell.
> The reason is that vfs_copy_file_range() checks if fs implement
> ->copy_file_range(), if it does, it will not fallback to ->remap_file_range()
> nor to splice. This is intentional - fs with ->copy_file_range() has full
> control including the decision to return whatever error code to userspace.
>
> The problem is that the FUSE kernel driver always implements
> ->copy_file_range(), regardless whether the FUSE server implements
> FUSE_COPY_FILE_RANGE. So for a FUSE server that does not
> implement FUSE_COPY_FILE_RANGE, fc->no_copy_file_range is
> true and copy_file_range() returns -EOPNOTSUPP.
>
> So either the fallback from FUSE_COPY_FILE_RANGE to
> FUSE_CLONE_FILE_RANGE will be done internally by FUSE,
> or clone/copy support will need to be advertised during FUSE_INIT
> and a different set of fuse_file_operations will need to be used
> accordingly, which seems overly complicated.
>
Note that FUSE_COPY_FILE_RANGE uses struct fuse_write_out to report
the number of bytes copied between files (uint32_t size), and therefore it can
not copy more than 2^32-1 bytes at each call. For example, a call to
cp --reflink
of 1T file yields multiple calls to copy_file_range() by userspace.

- Shachar.

> Thanks,
> Amir.
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux