On Mon, May 22, 2023 at 5:50 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > On Sat, 20 May 2023 at 12:20, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > On Fri, May 19, 2023 at 6:13 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > From: Alessio Balsini <balsini@xxxxxxxxxxx> > > > > > > > > Expose the FUSE_PASSTHROUGH capability to user space and declare all the > > > > basic data structures and functions as the skeleton on top of which the > > > > FUSE passthrough functionality will be built. > > > > > > > > As part of this, introduce the new FUSE passthrough ioctl, which allows > > > > the FUSE daemon to specify a direct connection between a FUSE file and a > > > > backing file. The ioctl requires user space to pass the file descriptor > > > > of one of its opened files to the FUSE driver and get an id in return. > > > > This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH > > > > to setup passthrough of read/write operations on the open file. > > > > > > > > Also, add the passthrough functions for the set-up and tear-down of the > > > > data structures and locks that will be used both when fuse_conns and > > > > fuse_files are created/deleted. > > > > > > > > Signed-off-by: Alessio Balsini <balsini@xxxxxxxxxxx> > > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > > --- > > > > fs/fuse/Makefile | 1 + > > > > fs/fuse/dev.c | 33 ++++++++++++++++++++++++-------- > > > > fs/fuse/dir.c | 7 ++++++- > > > > fs/fuse/file.c | 17 +++++++++++++---- > > > > fs/fuse/fuse_i.h | 27 ++++++++++++++++++++++++++ > > > > fs/fuse/inode.c | 21 +++++++++++++++++++- > > > > fs/fuse/passthrough.c | 40 +++++++++++++++++++++++++++++++++++++++ > > > > include/uapi/linux/fuse.h | 13 +++++++++++-- > > > > 8 files changed, 143 insertions(+), 16 deletions(-) > > > > create mode 100644 fs/fuse/passthrough.c > > > > > > > > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile > > > > index 0c48b35c058d..d9e1b47382f3 100644 > > > > --- a/fs/fuse/Makefile > > > > +++ b/fs/fuse/Makefile > > > > @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o > > > > obj-$(CONFIG_VIRTIO_FS) += virtiofs.o > > > > > > > > fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o > > > > +fuse-y += passthrough.o > > > > fuse-$(CONFIG_FUSE_DAX) += dax.o > > > > > > > > virtiofs-y := virtio_fs.o > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > > index 1a8f82f478cb..cb00234e7843 100644 > > > > --- a/fs/fuse/dev.c > > > > +++ b/fs/fuse/dev.c > > > > @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, > > > > unsigned long arg) > > > > { > > > > int res; > > > > - int oldfd; > > > > - struct fuse_dev *fud = NULL; > > > > + int fd, id; > > > > + struct fuse_dev *fud = fuse_get_dev(file); > > > > > > This is broken, see below. > > > > IIUC, *this* is not broken for the new ioctls... > > > > > > > > > struct fd f; > > > > > > > > + if (!fud) > > > > + return -EINVAL; > > > > + > > This is also broken for the old ioctl. > > > > > switch (cmd) { > > > > case FUSE_DEV_IOC_CLONE: > > > > - if (get_user(oldfd, (__u32 __user *)arg)) > > > > + if (get_user(fd, (__u32 __user *)arg)) > > > > return -EFAULT; > > > > > > > > - f = fdget(oldfd); > > > > + f = fdget(fd); > > > > if (!f.file) > > > > return -EINVAL; > > > > > > > > @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, > > > > * Check against file->f_op because CUSE > > > > * uses the same ioctl handler. > > > > */ > > > > - if (f.file->f_op == file->f_op) > > > > - fud = fuse_get_dev(f.file); > > > > - > > > > res = -EINVAL; > > > > - if (fud) { > > > > + if (f.file->f_op == file->f_op) { > > > > and this can be fixed by adding: > > + fud = fuse_get_dev(f.file); > > Yes, but it's still messy. > > I suggest separating out unrelated ioctl commands into different > functions. Not sure if it's worth doing the open/close in a common > function, I'll leave that to you. > > [snip] ok. > > > > Seems too restrictive. We could specify the max stacking depth in the > > > protocol and verify that when registering the passthrough file. > > > > > > I.e. fuse_sb->s_stack_depth of > > > > > > 0 -> passthrough disabled > > > 1 -> backing_sb->s_stack_depth == 0 > > > 2 -> backing_sb->stack_depth <= 1 > > > ... > > > > > > > Ok. Let's see. > > What do we stand to gain from the ability to setup nax stacking depth? > > > > We could use it to setup an overlayfs with lower FUSE that allows passthrough > > fds to a non-stacked backing fs and we could use it to setup FUSE that allows > > passthrough fds to overlayfs. > > > > I pity the FUSE userspace developers that will need to understand this > > setup parameter... > > I guess libfuse could parse it with other common options. It's > something that needs to be tuned on a per-case basis, not something > the filesystem designer can predict. > > Would be better if we could have a per-inode stack depth and then this > wouldn't have to be tuned. Is that feasible? > We could do something like: real = d_real(dentry); for (depth = 0; real != dentry && depth < FILESYSTEM_MAX_STACK_DEPTH; dentry = real, real = d_real(dentry)); Which brings up the question - what should fuse_d_real() return if we fuse passthrough is per file and not per inode? Should we add f_real() op? Doing this check on FUSE passthrough open is easy, because FUSE server can fall back to non-passthrough, but what about overlayfs? I don't think we want to fail open in overlayfs because a lower FUSE file turns out to be using the max stack depth. This is getting a bit complicated, so I would like to take a step back and think about which configurations we *really* want/need to support. I already listed FUSE over overlayfs and overlayfs over FUSE - they both seem reasonably likely. How about FUSE over FUSE (nested passthrough) and more combinations if you would like to take stack depth > 2 into design considerations? Do you think we can relax allowed combinations to make things easier? To me, it feels that restricting nested FUSE bypass is not so bad, because server has the option to fallback to non-passthrough at any level. So how about: 1. init: FUSE s_stack_depth = FUSE_PASSTHOUGH ? 1 : 0 2. Let stacking fs set a flag FMODE_PASSTHROUGH on backing file 3. on passthrough setup, check if fuse_file has FMODE_PASSTHROUGH (i.e. is FUSE the top of the stack?) 3.a. if FUSE is top of the stack, allow backing file stack depth 1 3.b. if FUSE is below something, limit backing file stack depth 0 This implies that the API to setup a backing file should provide the fuse_file that is expected to be setup for passthrough, so that server will have an opportunity to react and not reply to the open request with FOPEN_PASSTHROUGH. Alternatively, the client can silently ignore FOPEN_PASSTHROUGH, but that seems less ideal. Thanks, Amir.