Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 27, 2022 at 11:41:50PM -0700, Martin KaFai Lau wrote:
> On 9/26/22 4:17 PM, Daniel Rosenberg wrote:
> > These patches extend FUSE to be able to act as a stacked filesystem. This
> > allows pure passthrough, where the fuse file system simply reflects the lower
> > filesystem, and also allows optional pre and post filtering in BPF and/or the
> > userspace daemon as needed. This can dramatically reduce or even eliminate
> > transitions to and from userspace.
> > 
> > Currently, we either set the backing file/bpf at mount time at the root level,
> > or at lookup time, via an optional block added at the end of the lookup return
> > call. The added lookup block contains an fd for the backing file/folder and bpf
> > if necessary, or a signal to clear or inherit the parent values. We're looking
> > into two options for extending this to mkdir/mknod/etc, as we currently only
> > support setting the backing to a pre-existing file, although naturally you can
> > create new ones. When we're doing a lookup for create, we could pass an
> > fd for the parent dir and the name of the backing file we're creating. This has
> > the benefit of avoiding an additional call to userspace, but requires hanging
> > on to some data in a negative dentry where there is no elegant place to store it.
> > Another option is adding the same block we added to lookup to the create type
> > op codes. This keeps that code more uniform, but means userspace must implement
> > that logic in more areas.
> > 
> > As is, the patches definitely need some work before they're ready. We still
> > need to go through and ensure we respect changed filter values/disallow changes
> > that don't make sense. We aren't currently calling mnt_want_write for the lower
> > calls where appropriate, and we don't have an override_creds layer either. We
> > also plan to add to our read/write iter filters to allow for more interesting
> > use cases. There are also probably some node id inconsistencies. For nodes that
> > will be completely passthrough, we give an id of 0.
> > 
> > For the BPF verification side, we have currently set things set up in the old
> > style, with a new bpf program type and helper functions. From LPC, my
> > understanding is that newer bpf additions are done in a new style, so I imagine
> > much of that will need to be redone as well, but hopefully these patches get
> > across what our needs there are.
> > 
> > For testing, we've provided the selftest code we have been using. We also have
> > a mode to run with no userspace daemon in a pure passthrough mode that I have
> > been running xfstests over to get some coverage on the backing operation code.
> > I had to modify mounts/unmounts to get that running, along with some other
> > small touch ups. The most notable failure I currently see there is in
> > generic/126, which I suspect is likely related to override_creds.
> > 
> 
> Interesting idea.
> 
> Some comments on review logistics:
> - The set is too long and some of the individual patches are way too long
> for one single patch to review.  Keep in mind that not all of us here are
> experts in both fuse and bpf.  Making it easier to review first will help at
> the beginning.  Some ideas:
> 
>   - Only implement a few ops in the initial revision. From quickly browsing
> the set, it is implementing the 'struct file_operations
> fuse_file_operations'? Maybe the first few revisions can start with a few of
> the ops first.
> 

I had a similar thought when poking through this. A related question I
had is how much of a functional dependency does the core passthrough
mechanism have on bpf? If bpf is optional for filtering purposes and
isn't absolutely necessary to set up a basic form of passthrough, I
think review would be made easier by splitting off those core bits from
the bpf components so each part is easier to review by people who know
them best. For example, introduce all the fuse enhancements, hooks and
cleanups to set up a passthrough to start the series, then plumb in the
bpf filtering magic on top. Hm?

FWIW, if this is an RFC/prototype and you want more efficient review
cycles, another idea to take that a step further could be to start with
read-only support (or maybe even just directory walking?).

BTW if the bpf bits are optional, how might one mount a fuse/no
daemon/passthrough filesystem from userspace? Is that possible with this
series as is?

Something more on the fuse side.. it looks like we introduce a pattern
where bits of generic request completion processing can end up
duplicated between the shortcut (i.e.  _backing()/_finalize()) handlers
and the traditional post request code, because the shortcuts basically
bypass the entire rest of the codepath. For example, something like
create_new_entry() is currently reused for several inode creation
operations. With passthrough mode, it looks like some of that code (i.e.
vfs dentry fixups) is split off from create_new_entry() into each
individual backing mode handler.

It looks like much of the lower level request processing code was
refactored into the fuse_iqueue to support things like virtiofs. Have
you looked into whether that abstraction can be reused or enhanced to
support bpf filtering, direct passthrough calls, etc.? Or perhaps
whether more of the higher level code could be refactored in a similar
way to encourage more reuse and avoid branching off every fs operation
into a special passthrough codepath?

Brian

>   - Please make the patches that can be applied to the bpf-next tree
> cleanly. For example, in patch 3, where is 18e2ec5bf453 coming from? I
> cannot find it in bpf-next and linux-next tree.
>   - Without applying it to an upstream tree cleanly, in a big set like this,
> I have no idea when bpf_prog_run() is called in patch 24 because the diff
> context is in fuse_bpf_cleanup and apparently it is not where the bpf prog
> is run.
> 
> Some high level comments on the set:
> - Instead of adding bpf helpers, you should consider kfunc instead. You can
> take a look at the recent HID patchset v10 or the recent nf conntrack bpf
> set.
> 
> - Instead of expressing as packet data, using the recent dynptr is a better
> way to go for handling a mem blob.
> 
> - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
> file_operations' without going back to the user daemon? Have you looked at
> struct_ops which seems to be a better fit here?  If the bpf prog does not
> know how to handle an operation (or file?), it can call fuse_file_llseek
> (for example) as a kfunc to handle the request.
> 
> - The test SEC("test_trace") seems mostly a synthetic test for checking
> correctness.  Does it have a test that shows a more real life use case? or I
> have missed things in patch 26?
> 
> - Please use the skel to load the program.  It is pretty hard to read the
> loader in patch 26.
> 
> - I assume the main objective is for performance by not going back to the
> user daemon?  Do you have performance number?
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux