Re: [PATCH V8 1/3] fuse: Definitions and ioctl() for passthrough

Amir Goldstein <amir73il@xxxxxxxxx> · Tue, 22 Sep 2020 19:08:48 +0300

On Tue, Sep 22, 2020 at 3:15 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote:
>
> On Fri, Sep 18, 2020 at 10:59:16PM +0300, Amir Goldstein wrote:
> > On Fri, Sep 18, 2020 at 7:33 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote:
> > [...]
> > > > ... for example:
> > > >
> > > >        if (fs_stack_depth && passthrough_sb->s_type == fuse_fs_type) {
> > > >                pr_err("FUSE: stacked passthrough file\n");
> > > >                goto out;
> > > >        }
> > > >
> > > > But maybe we want to ban passthrough to any lower FUSE at least for start.
> > >
> > >
> > > Yes, what I proposed here is very conservative, and your solution sounds
> > > good to me. Unfortunately I don't have a clear idea of what could go wrong
> > > if we relax this constraint. I need some guidance from you experts here.
> > >
> >
> > I guess the main concern would be locking order and deadlocks.
> > With my suggestion I think deadlocks are avoided and I am less sure
> > but think that lockdep should not have false positives either.
> >
> > If people do need the 1-level stacking, I can try to think harder
> > if it is safe and maybe add some more compromise limitations.
> >
> > > What do you think if we keep this overly strict rule for now to avoid
> > > unintended behaviors and come back as we find affected use case?
> > >
> >
> > I can live with that if other designated users don't mind the limitation.
> >
> > I happen to be developing a passthrough FUSE fs [1] myself and
> > I also happen to be using it to pass through to overlayfs.
> > OTOH, the workloads for my use case are mostly large sequential IO,
> > and the hardware can handle the few extra syscalls, so the passthrough
> > fd feature is not urgent for my use case at this point in time.
>
>
> This is something that only happens if the FUSE daemon opens a connection
> wanting FUSE_PASSTHROUGH, so shouldn't affect existing use cases. Or am I
> wrong?

I meant, if I would have expected a significant performance improvement
from FUSE_PASSTHROUGH in my use case, I would have wanted to use it
and then passthrough to overlayfs would have mattered to me more.

> If some users find this limitation to be an issue, we can rethink/relax
> this policy in the future... Switching to something like the solution you
> proposed does not break the current behavior, so we would be able to change
> this with minimal effort.
>

True.

>
> >
> >
> > >
> > > >
> > > > > +       ret = -EEXIST;
> > > >
> > > > Why EEXIST? Why not EINVAL?
> > > >
> > >
> > >
> > > Reaching the stacking limit sounded like an error caused by the undesired
> > > existence of something, thus EEXIST sounded like a good fit.
> > > No problem in changing that to EINVAL.
> > >
> > >
> > >
> > > > > +       if (fs_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
> > > > > +               pr_err("FUSE: maximum fs stacking depth exceeded for passthrough\n");
> > > > > +               goto out;
> > > > > +       }
> > > > > +
> > > > > +       req->args->passthrough_filp = passthrough_filp;
> > > > > +       return 0;
> > > > > +out:
> > > > > +       fput(passthrough_filp);
> > > > > +       return ret;
> > > > > +}
> > > > > +
> > > >
> > > > And speaking of overlayfs, I believe you may be able to test your code with
> > > > fuse-overlayfs (passthrough to upper files).
> > > >
> > ...
> > >
> > > This is indeed a project with several common elements to what we are doing
> > > in Android,
> >
> > Are you in liberty to share more information about the Android project?
> > Is it related to Incremental FS [2]?
> >
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/amir73il/libfuse/commits/cachegwfs
> > [2] https://lore.kernel.org/linux-fsdevel/20190502040331.81196-1-ezemtsov@xxxxxxxxxx/
>
> Thanks for the pointer to cachegwfs, I'm glad I'm seeing more and more
> passthrough file systems in addition to our use case in Android.
>

I am hearing about a lot of these projects.
I think that FUSE_PASSTHROUGH is a very useful feature.
I have an intention to explore passthrough to directory fd for
directory modifications. I sure hope you will beat me to it ;-)

> I'm not directly involved in the Incremental FS project, but, as far as I
> remember, only for the first PoC was actually developed as a FUSE file
> system. Because of the overhead introduced by the user space round-trips,
> that design was left behind and the whole Incremental FS infrastructure
> switched to becoming a kernel module.
> In general, the FUSE passthrough patch set proposed in this series wouldn't
> be helpful for that use case because, for example, Incremental FS requires
> live (de)compression of data, that can only be performed by the FUSE
> daemon.
>

Ext4 supports inline encryption. Btrfs supports encrypted/compressed extents.
No reason for FUSE not to support the same.
Is it trivial? No.
Is it an excuse for not using FUSE and writing a new userspace fs. Not
in my option.

> The specific use case I'm trying to improve with this FUSE passthrough
> series is instead related to the scoped storage feature that we introduced
> in Android 11, that is based on FUSE, and affects those folders that are
> shared among all the Apps (e.g., DCIM, Downloads, etc). More details here:
>

sdcard fs has had a lot of reincarnations :-)

I for one am happy with the return to FUSE.
Instead of saying "FUSE is too slow" and implementing a kernel sdcardfs,
improve FUSE to be faster for everyone - that's the way to go ;-)

> https://developer.android.com/about/versions/11/privacy/storage
>
> With FUSE we now have a flexible way to specify more fine-grained
> permissions (e.g., specify if an App is allowed to access files depenind on
> their type), create private App folders, maintain legacy paths for old
> Apps, manipulate pieces of files at run-time, etc. Forgive me if I forgot
> something. :)
> These extra operations may slower the file system access comprared to a
> native, direct access, but if:
> - the file being accessed requires special treatment before being passed to
>   the requesting App, then further tests will be performed at every
>   read/write operation (with some optimizations). This overhead is of
>   course annoying, but is something we are happy to pay because is
>   beneficial to the user (i.e., improves privacy and security).
> - Instead, if at open time a file is recognized as safe to access and does
>   not require any further enforcement from the FUSE daemon, there's no need
>   to pay for future read/write operations overheads, that wouldn't do
>   anything more than just copying data (possibly with the help of
>   splicing). In this case the FUSE passthrough feature proposed in this
>   series can be enabled to reduce this overhead.
>
> Moreover, some Apps use big files that contain all their resources, then
> access these files at random offsets, not taking advantage of read-ahead
> cache. The same happens for files containing databases.
> In addition, our specific use case involves a FUSE daemon is probably
> heavier than the average passthrough file system (for example those that
> are in libfuse/examples), so reducing user space round trips thanks to the
> patchset proposed here gives a strong improvement.
>
> Anyway, only showing the improvements in our extreme use case would have
> brought a limited case for upstreaming, and that is why the benchmarks I
> ran (presented in the cover letter), are based on the vanilla
> passthrough_hp daemon managing kind of standard storage workloads, and
> still show evident performance improvements.
> Running and sharing benchmarks on Android is also tricky because at the
> time of first submission the most recent public real device was running a
> 4.14 kernel, so that would have been maybe nice to see, but not that
> interesting... :)
>
> I hope I answered all your questions, please let me know if I missed
> something and feel free to ask for more details!
>

Thanks for sharing the details,
Amir.