Re: [PATCH RESEND V12 3/8] fuse: Definitions and ioctl for passthrough

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 12 Sep 2022 15:29:46 +0300

On Mon, Sep 12, 2022 at 12:29 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Sat, 10 Sept 2022 at 10:52, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> > I think we should accept the fact that just as any current FUSE
> > passthrough (in userspace) implementation is limited to max number of
> > open files as the server's process limitation, kernel passthrough implementation
> > will be limited by inheriting the mounter's process limitation.
> >
> > There is no reason that the server should need to keep more
> > passthrough fd's open than client open fds.
>
> Maybe you're right.
>
> > If we only support FOPEN_PASSTHROUGH_AUTOCLOSE as v12
> > patches implicitly do, then the memory overhead is not much different
> > than the extra overlayfs pseudo realfiles.
>
> How exactly would this work?
>
> ioctl(F_D_I_P_OPEN) - create passthrough fd with ref 1
> open/FOPEN_PASSTHOUGH -  inc refcount in passthrough fd
> release - put refcount in passthrough fd
> ioctl(F_D_I_P_CLOSE) - put ref in passthrough fd
>
> Due to being refcounted the F_D_I_P_CLOSE can come at any point past
> the finished open request.
>
> Or did you have something else in mind?
>

What I had in mind is that FOPEN_PASSTHROUGH_AUTOCLOSE
"transfers" the server's refcount to the kernel and server does
not need to call explicit F_D_I_P_CLOSE.

This is useful for servers that don't care about reusing mappings.

> > > One other question that's nagging me is how to "unhide" these pseudo-fds.
> > >
> > > Could we create a kernel thread for each fuse sb which has normal
> > > file-table for these?  This would would allow inspecting state through
> > > /proc/$KTHREDID/fd, etc..
> > >
> >
> > Yeah that sounds like a good idea.
> > As I mentioned elsewhere in the thread, io_uring also has a mechanism
> > to register open files with the kernel to perform IO on them later.
> > I assume those files are also visible via some /proc/$KTHREDID/fd,
> > but I'll need to check.
> >
> > BTW, I see that the Android team is presenting eBPF-FUSE on LPC
> > coming Tuesday [1].
>
> At first glance it looks like a filtered kernel-only passthrough +
> fuse fallback, where filtering is provided by eBPF scripts and only
> falls back to userspace access on more complex cases.  Maybe it's a
> good direction, we'll see.

Yeh, we'll see.

> Apparently the passthrough case is
> important enough for various use cases.
>

Indeed.
My use case is HSM and I think that using FUSE for HSM is becoming
more and more common these days.

One of the things that bothers me is that both this FUSE_PASSTHROUGH
patch set and any future eBPF-FUSE passthrough implementation is
bound to duplicate a lot of code and know how from overlayfs
(along with the bugs).

We could try to factor out some common bits to a kernel fs passthough
library.

Anotehr options to consider is not to add any passthrough logic
to FUSE at all.

Instead, implement a "switch" fs to choose between passthrough
to one of several underlying fs "branches", where one of the branches
could be local fs and another a FUSE fs (i.e. for the complex cases).

A similar design was described at:
https://github.com/github/libprojfs/blob/master/docs/design.md#phase-2--hybrid

This "switch" fs is not that much different from overlayfs, when
removing the "merge dir" logic and replacing the "is_upper" logic
with a generic eBPF "choose_branch" logic.

Food for thought.

Thanks,
Amir.