On Mon, Sep 12, 2022 at 12:29 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Sat, 10 Sept 2022 at 10:52, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > I think we should accept the fact that just as any current FUSE > > passthrough (in userspace) implementation is limited to max number of > > open files as the server's process limitation, kernel passthrough implementation > > will be limited by inheriting the mounter's process limitation. > > > > There is no reason that the server should need to keep more > > passthrough fd's open than client open fds. > > Maybe you're right. > > > If we only support FOPEN_PASSTHROUGH_AUTOCLOSE as v12 > > patches implicitly do, then the memory overhead is not much different > > than the extra overlayfs pseudo realfiles. > > How exactly would this work? > > ioctl(F_D_I_P_OPEN) - create passthrough fd with ref 1 > open/FOPEN_PASSTHOUGH - inc refcount in passthrough fd > release - put refcount in passthrough fd > ioctl(F_D_I_P_CLOSE) - put ref in passthrough fd > > Due to being refcounted the F_D_I_P_CLOSE can come at any point past > the finished open request. > > Or did you have something else in mind? > What I had in mind is that FOPEN_PASSTHROUGH_AUTOCLOSE "transfers" the server's refcount to the kernel and server does not need to call explicit F_D_I_P_CLOSE. This is useful for servers that don't care about reusing mappings. > > > One other question that's nagging me is how to "unhide" these pseudo-fds. > > > > > > Could we create a kernel thread for each fuse sb which has normal > > > file-table for these? This would would allow inspecting state through > > > /proc/$KTHREDID/fd, etc.. > > > > > > > Yeah that sounds like a good idea. > > As I mentioned elsewhere in the thread, io_uring also has a mechanism > > to register open files with the kernel to perform IO on them later. > > I assume those files are also visible via some /proc/$KTHREDID/fd, > > but I'll need to check. > > > > BTW, I see that the Android team is presenting eBPF-FUSE on LPC > > coming Tuesday [1]. > > At first glance it looks like a filtered kernel-only passthrough + > fuse fallback, where filtering is provided by eBPF scripts and only > falls back to userspace access on more complex cases. Maybe it's a > good direction, we'll see. Yeh, we'll see. > Apparently the passthrough case is > important enough for various use cases. > Indeed. My use case is HSM and I think that using FUSE for HSM is becoming more and more common these days. One of the things that bothers me is that both this FUSE_PASSTHROUGH patch set and any future eBPF-FUSE passthrough implementation is bound to duplicate a lot of code and know how from overlayfs (along with the bugs). We could try to factor out some common bits to a kernel fs passthough library. Anotehr options to consider is not to add any passthrough logic to FUSE at all. Instead, implement a "switch" fs to choose between passthrough to one of several underlying fs "branches", where one of the branches could be local fs and another a FUSE fs (i.e. for the complex cases). A similar design was described at: https://github.com/github/libprojfs/blob/master/docs/design.md#phase-2--hybrid This "switch" fs is not that much different from overlayfs, when removing the "merge dir" logic and replacing the "is_upper" logic with a generic eBPF "choose_branch" logic. Food for thought. Thanks, Amir.