Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 29 Aug 2023 21:14:35 +0300

On Tue, Jun 6, 2023 at 4:06 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Tue, 6 Jun 2023 at 13:19, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 6, 2023 at 12:49 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:

[...]

> > >    I'm not sure that the per-file part of this is necessary, doing
> > > everything per-inode should be okay.   What are the benefits?
> > >
> >
> > I agree that semantics are simpler with per-inode.
> > The only benefit I see to per-file is the lifetime of the mapping.
> >
> > It is very easy IMO to program with a mapping scope of
> > open-to-close that is requested by FOPEN_PASSTHROUGH
> > and FOPEN_PASSTHROUGH_AUTO_CLOSE.
>
> Right, and this case the resource limiting is also easy to think about.
>
> I'm not worried about consistency, fuse server can do whatever it
> wants with the data anyway.  I am worried about the visibility of the
> mapping.  One idea would be to create a kernel thread for each fuse sb
> instance and install mapped files into that thread's file table.  In
> theory this should be trivial as the VFS has all the helpers that can
> do this safely.
>

Sounds doable.
I will look into this after I get the basics sorted out.

> >
> > I think if I can make this patch set work per-inode, the roadmap
> > from here to FUSE-BPF would be much more clear.
>
> One advantage of per-inode non-autoclose would be that open could be
> done without a roundtrip to the server.   However the resource
> limiting becomes harder to think about.
>
> So it might make sense to just create two separate modes:
>
>  - per-open unmap-on-release (autoclose)
>  - per-inode unmap-on-forget (non-autoclose, mapping can be torn down
> explicitly)
>

[...]

> > In summary, I will try to come up with v14 that is:
> > - privileged user only
> > - no resource limitation
> > - per-inode mapping
>
> Okay, that's a logical first step.

I said that I would try to start with per-inode operation mode,
but I realize that it does not meet one of my basic project requirements -
I need to be able to passthrough some of the fds of the same inode,
but not all of them.

I was thinking of a slightly different model that could (possibly)
unify those two modes and be flexible enough to be extended with
BPF filters going forward.

The model is based on per-inode association to backing fd.

1. A single association (mapping) can created per-inode using ioctl
 - There is no mapping id - the inode either has a backing_fd or not
 - Trying to set another backing_fd for inode gets EEXIST if one exists
 - A backing_fd can be torn with ioctl
 - The backing_fd is of course auto-closed on forget

2. The backing_fd association itself does not cause any passthrough!
 - Passthrough operations need to be opt-in independently of mapping
   the backing_fd
 - Down the road, passthrough operation mask could be setup in the
   mapping
 - Down the road, a BPF program to decide on passthrough operation
   could be setup in the mapping as the BPF patches intended

3. Initially, the only way to opt-in to passthrough read/write operations
    is by passing the FOPEN_PASSTHROUGH flag on open
 - FOPEN_PASSTHROUGH will have no effect if backing_fd wasn't
   mapped beforehand
 - As long as there are FUSE files open with FOPEN_PASSTHROUGH,
   the inode's backing_fd cannot be unmapped
 - If a file is opened with FOPEN_PASSTHROUGH_AUTOCLOSE,
   when that file is closed, *if it is the last file referencing* the inode,
   the backing_fd is auto-closed

This is not as flexible as being able to map each FUSE fd to a different
backing_fd.

In the future, FUSE fds could have their own individual backing_fd if
needed, but for now, I think that starting with a single shared backing_fd
with per-fd opt-in on open, would be simpler to implement, but still useful.

One obvious downside of the shared backing_fd approach is that
if FUSE fds are a mix of O_RDONLY and O_WRONLY, the shared
backing_fd needs to be setup as O_RDWR in advance.

I think this is not such a strict limitation for the first implementation,
since we anyway agreed that the first implementation would require
the server to be privileged.

Do you think this is an acceptable first step?

Thanks,
Amir.