Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE

Martin KaFai Lau <martin.lau@xxxxxxxxx> · Tue, 27 Sep 2022 23:41:50 -0700

On 9/26/22 4:17 PM, Daniel Rosenberg wrote:
These patches extend FUSE to be able to act as a stacked filesystem. This
allows pure passthrough, where the fuse file system simply reflects the lower
filesystem, and also allows optional pre and post filtering in BPF and/or the
userspace daemon as needed. This can dramatically reduce or even eliminate
transitions to and from userspace.

Currently, we either set the backing file/bpf at mount time at the root level,
or at lookup time, via an optional block added at the end of the lookup return
call. The added lookup block contains an fd for the backing file/folder and bpf
if necessary, or a signal to clear or inherit the parent values. We're looking
into two options for extending this to mkdir/mknod/etc, as we currently only
support setting the backing to a pre-existing file, although naturally you can
create new ones. When we're doing a lookup for create, we could pass an
fd for the parent dir and the name of the backing file we're creating. This has
the benefit of avoiding an additional call to userspace, but requires hanging
on to some data in a negative dentry where there is no elegant place to store it.
Another option is adding the same block we added to lookup to the create type
op codes. This keeps that code more uniform, but means userspace must implement
that logic in more areas.

As is, the patches definitely need some work before they're ready. We still
need to go through and ensure we respect changed filter values/disallow changes
that don't make sense. We aren't currently calling mnt_want_write for the lower
calls where appropriate, and we don't have an override_creds layer either. We
also plan to add to our read/write iter filters to allow for more interesting
use cases. There are also probably some node id inconsistencies. For nodes that
will be completely passthrough, we give an id of 0.

For the BPF verification side, we have currently set things set up in the old
style, with a new bpf program type and helper functions. From LPC, my
understanding is that newer bpf additions are done in a new style, so I imagine
much of that will need to be redone as well, but hopefully these patches get
across what our needs there are.

For testing, we've provided the selftest code we have been using. We also have
a mode to run with no userspace daemon in a pure passthrough mode that I have
been running xfstests over to get some coverage on the backing operation code.
I had to modify mounts/unmounts to get that running, along with some other
small touch ups. The most notable failure I currently see there is in
generic/126, which I suspect is likely related to override_creds.

Interesting idea.

Some comments on review logistics:
- The set is too long and some of the individual patches are way too long for 
one single patch to review.  Keep in mind that not all of us here are experts in 
both fuse and bpf.  Making it easier to review first will help at the beginning. 
 Some ideas:

  - Only implement a few ops in the initial revision. From quickly browsing the 
set, it is implementing the 'struct file_operations fuse_file_operations'? 
Maybe the first few revisions can start with a few of the ops first.

  - Please make the patches that can be applied to the bpf-next tree cleanly. 
For example, in patch 3, where is 18e2ec5bf453 coming from? I cannot find it in 
bpf-next and linux-next tree.
  - Without applying it to an upstream tree cleanly, in a big set like this, I 
have no idea when bpf_prog_run() is called in patch 24 because the diff context 
is in fuse_bpf_cleanup and apparently it is not where the bpf prog is run.

Some high level comments on the set:
- Instead of adding bpf helpers, you should consider kfunc instead. You can take 
a look at the recent HID patchset v10 or the recent nf conntrack bpf set.

- Instead of expressing as packet data, using the recent dynptr is a better way 
to go for handling a mem blob.

- iiuc, the idea is to allow bpf prog to optionally handle the 'struct 
file_operations' without going back to the user daemon? Have you looked at 
struct_ops which seems to be a better fit here?  If the bpf prog does not know 
how to handle an operation (or file?), it can call fuse_file_llseek (for 
example) as a kfunc to handle the request.

- The test SEC("test_trace") seems mostly a synthetic test for checking 
correctness.  Does it have a test that shows a more real life use case? or I 
have missed things in patch 26?

- Please use the skel to load the program.  It is pretty hard to read the loader 
in patch 26.

- I assume the main objective is for performance by not going back to the user 
daemon?  Do you have performance number?