On Sat, 20 May 2023 at 12:20, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Fri, May 19, 2023 at 6:13 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Fri, 19 May 2023 at 14:57, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > From: Alessio Balsini <balsini@xxxxxxxxxxx> > > > > > > Expose the FUSE_PASSTHROUGH capability to user space and declare all the > > > basic data structures and functions as the skeleton on top of which the > > > FUSE passthrough functionality will be built. > > > > > > As part of this, introduce the new FUSE passthrough ioctl, which allows > > > the FUSE daemon to specify a direct connection between a FUSE file and a > > > backing file. The ioctl requires user space to pass the file descriptor > > > of one of its opened files to the FUSE driver and get an id in return. > > > This id may be passed in a reply to OPEN with flag FOPEN_PASSTHROUGH > > > to setup passthrough of read/write operations on the open file. > > > > > > Also, add the passthrough functions for the set-up and tear-down of the > > > data structures and locks that will be used both when fuse_conns and > > > fuse_files are created/deleted. > > > > > > Signed-off-by: Alessio Balsini <balsini@xxxxxxxxxxx> > > > Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> > > > --- > > > fs/fuse/Makefile | 1 + > > > fs/fuse/dev.c | 33 ++++++++++++++++++++++++-------- > > > fs/fuse/dir.c | 7 ++++++- > > > fs/fuse/file.c | 17 +++++++++++++---- > > > fs/fuse/fuse_i.h | 27 ++++++++++++++++++++++++++ > > > fs/fuse/inode.c | 21 +++++++++++++++++++- > > > fs/fuse/passthrough.c | 40 +++++++++++++++++++++++++++++++++++++++ > > > include/uapi/linux/fuse.h | 13 +++++++++++-- > > > 8 files changed, 143 insertions(+), 16 deletions(-) > > > create mode 100644 fs/fuse/passthrough.c > > > > > > diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile > > > index 0c48b35c058d..d9e1b47382f3 100644 > > > --- a/fs/fuse/Makefile > > > +++ b/fs/fuse/Makefile > > > @@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o > > > obj-$(CONFIG_VIRTIO_FS) += virtiofs.o > > > > > > fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o > > > +fuse-y += passthrough.o > > > fuse-$(CONFIG_FUSE_DAX) += dax.o > > > > > > virtiofs-y := virtio_fs.o > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > index 1a8f82f478cb..cb00234e7843 100644 > > > --- a/fs/fuse/dev.c > > > +++ b/fs/fuse/dev.c > > > @@ -2255,16 +2255,19 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, > > > unsigned long arg) > > > { > > > int res; > > > - int oldfd; > > > - struct fuse_dev *fud = NULL; > > > + int fd, id; > > > + struct fuse_dev *fud = fuse_get_dev(file); > > > > This is broken, see below. > > IIUC, *this* is not broken for the new ioctls... > > > > > > struct fd f; > > > > > > + if (!fud) > > > + return -EINVAL; > > > + This is also broken for the old ioctl. > > > switch (cmd) { > > > case FUSE_DEV_IOC_CLONE: > > > - if (get_user(oldfd, (__u32 __user *)arg)) > > > + if (get_user(fd, (__u32 __user *)arg)) > > > return -EFAULT; > > > > > > - f = fdget(oldfd); > > > + f = fdget(fd); > > > if (!f.file) > > > return -EINVAL; > > > > > > @@ -2272,17 +2275,31 @@ static long fuse_dev_ioctl(struct file *file, unsigned int cmd, > > > * Check against file->f_op because CUSE > > > * uses the same ioctl handler. > > > */ > > > - if (f.file->f_op == file->f_op) > > > - fud = fuse_get_dev(f.file); > > > - > > > res = -EINVAL; > > > - if (fud) { > > > + if (f.file->f_op == file->f_op) { > > and this can be fixed by adding: > + fud = fuse_get_dev(f.file); Yes, but it's still messy. I suggest separating out unrelated ioctl commands into different functions. Not sure if it's worth doing the open/close in a common function, I'll leave that to you. [snip] > > Seems too restrictive. We could specify the max stacking depth in the > > protocol and verify that when registering the passthrough file. > > > > I.e. fuse_sb->s_stack_depth of > > > > 0 -> passthrough disabled > > 1 -> backing_sb->s_stack_depth == 0 > > 2 -> backing_sb->stack_depth <= 1 > > ... > > > > Ok. Let's see. > What do we stand to gain from the ability to setup nax stacking depth? > > We could use it to setup an overlayfs with lower FUSE that allows passthrough > fds to a non-stacked backing fs and we could use it to setup FUSE that allows > passthrough fds to overlayfs. > > I pity the FUSE userspace developers that will need to understand this > setup parameter... I guess libfuse could parse it with other common options. It's something that needs to be tuned on a per-case basis, not something the filesystem designer can predict. Would be better if we could have a per-inode stack depth and then this wouldn't have to be tuned. Is that feasible? > So ignoring the possibility of FILESYSTEM_MAX_STACK_DEPTH changing in > the future, maybe better describe this with two capability flags > instead of an int? The max depth could be changed, this value was just chosen because we didn't have a use case for a larger fs stack, I guess. Some analysis about the kernel stack usage would be required before doing so (btw. such analysis was never done, so it would be useful regardless). Thanks, Miklos