On Tue, Sep 22, 2020 at 3:15 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote: > > On Fri, Sep 18, 2020 at 10:59:16PM +0300, Amir Goldstein wrote: > > On Fri, Sep 18, 2020 at 7:33 PM Alessio Balsini <balsini@xxxxxxxxxxx> wrote: > > [...] > > > > ... for example: > > > > > > > > if (fs_stack_depth && passthrough_sb->s_type == fuse_fs_type) { > > > > pr_err("FUSE: stacked passthrough file\n"); > > > > goto out; > > > > } > > > > > > > > But maybe we want to ban passthrough to any lower FUSE at least for start. > > > > > > > > > Yes, what I proposed here is very conservative, and your solution sounds > > > good to me. Unfortunately I don't have a clear idea of what could go wrong > > > if we relax this constraint. I need some guidance from you experts here. > > > > > > > I guess the main concern would be locking order and deadlocks. > > With my suggestion I think deadlocks are avoided and I am less sure > > but think that lockdep should not have false positives either. > > > > If people do need the 1-level stacking, I can try to think harder > > if it is safe and maybe add some more compromise limitations. > > > > > What do you think if we keep this overly strict rule for now to avoid > > > unintended behaviors and come back as we find affected use case? > > > > > > > I can live with that if other designated users don't mind the limitation. > > > > I happen to be developing a passthrough FUSE fs [1] myself and > > I also happen to be using it to pass through to overlayfs. > > OTOH, the workloads for my use case are mostly large sequential IO, > > and the hardware can handle the few extra syscalls, so the passthrough > > fd feature is not urgent for my use case at this point in time. > > > This is something that only happens if the FUSE daemon opens a connection > wanting FUSE_PASSTHROUGH, so shouldn't affect existing use cases. Or am I > wrong? I meant, if I would have expected a significant performance improvement from FUSE_PASSTHROUGH in my use case, I would have wanted to use it and then passthrough to overlayfs would have mattered to me more. > If some users find this limitation to be an issue, we can rethink/relax > this policy in the future... Switching to something like the solution you > proposed does not break the current behavior, so we would be able to change > this with minimal effort. > True. > > > > > > > > > > > > > > > > > + ret = -EEXIST; > > > > > > > > Why EEXIST? Why not EINVAL? > > > > > > > > > > > > > Reaching the stacking limit sounded like an error caused by the undesired > > > existence of something, thus EEXIST sounded like a good fit. > > > No problem in changing that to EINVAL. > > > > > > > > > > > > > > + if (fs_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) { > > > > > + pr_err("FUSE: maximum fs stacking depth exceeded for passthrough\n"); > > > > > + goto out; > > > > > + } > > > > > + > > > > > + req->args->passthrough_filp = passthrough_filp; > > > > > + return 0; > > > > > +out: > > > > > + fput(passthrough_filp); > > > > > + return ret; > > > > > +} > > > > > + > > > > > > > > And speaking of overlayfs, I believe you may be able to test your code with > > > > fuse-overlayfs (passthrough to upper files). > > > > > > ... > > > > > > This is indeed a project with several common elements to what we are doing > > > in Android, > > > > Are you in liberty to share more information about the Android project? > > Is it related to Incremental FS [2]? > > > > Thanks, > > Amir. > > > > [1] https://github.com/amir73il/libfuse/commits/cachegwfs > > [2] https://lore.kernel.org/linux-fsdevel/20190502040331.81196-1-ezemtsov@xxxxxxxxxx/ > > Thanks for the pointer to cachegwfs, I'm glad I'm seeing more and more > passthrough file systems in addition to our use case in Android. > I am hearing about a lot of these projects. I think that FUSE_PASSTHROUGH is a very useful feature. I have an intention to explore passthrough to directory fd for directory modifications. I sure hope you will beat me to it ;-) > I'm not directly involved in the Incremental FS project, but, as far as I > remember, only for the first PoC was actually developed as a FUSE file > system. Because of the overhead introduced by the user space round-trips, > that design was left behind and the whole Incremental FS infrastructure > switched to becoming a kernel module. > In general, the FUSE passthrough patch set proposed in this series wouldn't > be helpful for that use case because, for example, Incremental FS requires > live (de)compression of data, that can only be performed by the FUSE > daemon. > Ext4 supports inline encryption. Btrfs supports encrypted/compressed extents. No reason for FUSE not to support the same. Is it trivial? No. Is it an excuse for not using FUSE and writing a new userspace fs. Not in my option. > The specific use case I'm trying to improve with this FUSE passthrough > series is instead related to the scoped storage feature that we introduced > in Android 11, that is based on FUSE, and affects those folders that are > shared among all the Apps (e.g., DCIM, Downloads, etc). More details here: > sdcard fs has had a lot of reincarnations :-) I for one am happy with the return to FUSE. Instead of saying "FUSE is too slow" and implementing a kernel sdcardfs, improve FUSE to be faster for everyone - that's the way to go ;-) > https://developer.android.com/about/versions/11/privacy/storage > > With FUSE we now have a flexible way to specify more fine-grained > permissions (e.g., specify if an App is allowed to access files depenind on > their type), create private App folders, maintain legacy paths for old > Apps, manipulate pieces of files at run-time, etc. Forgive me if I forgot > something. :) > These extra operations may slower the file system access comprared to a > native, direct access, but if: > - the file being accessed requires special treatment before being passed to > the requesting App, then further tests will be performed at every > read/write operation (with some optimizations). This overhead is of > course annoying, but is something we are happy to pay because is > beneficial to the user (i.e., improves privacy and security). > - Instead, if at open time a file is recognized as safe to access and does > not require any further enforcement from the FUSE daemon, there's no need > to pay for future read/write operations overheads, that wouldn't do > anything more than just copying data (possibly with the help of > splicing). In this case the FUSE passthrough feature proposed in this > series can be enabled to reduce this overhead. > > Moreover, some Apps use big files that contain all their resources, then > access these files at random offsets, not taking advantage of read-ahead > cache. The same happens for files containing databases. > In addition, our specific use case involves a FUSE daemon is probably > heavier than the average passthrough file system (for example those that > are in libfuse/examples), so reducing user space round trips thanks to the > patchset proposed here gives a strong improvement. > > Anyway, only showing the improvements in our extreme use case would have > brought a limited case for upstreaming, and that is why the benchmarks I > ran (presented in the cover letter), are based on the vanilla > passthrough_hp daemon managing kind of standard storage workloads, and > still show evident performance improvements. > Running and sharing benchmarks on Android is also tricky because at the > time of first submission the most recent public real device was running a > 4.14 kernel, so that would have been maybe nice to see, but not that > interesting... :) > > I hope I answered all your questions, please let me know if I missed > something and feel free to ask for more details! > Thanks for sharing the details, Amir.