Re: [PATCH 00/26] FUSE BPF: A Stacked Filesystem Extension for FUSE

Daniel Rosenberg <drosen@xxxxxxxxxx> · Fri, 30 Sep 2022 17:05:42 -0700

On Tue, Sep 27, 2022 at 11:41 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> Interesting idea.
>
> Some comments on review logistics:
> - The set is too long and some of the individual patches are way too long for
> one single patch to review.  Keep in mind that not all of us here are experts in
> both fuse and bpf.  Making it easier to review first will help at the beginning.
>   Some ideas:
>
>    - Only implement a few ops in the initial revision. From quickly browsing the
> set, it is implementing the 'struct file_operations fuse_file_operations'?
> Maybe the first few revisions can start with a few of the ops first.
>

I've split it up a fair bit already, do you mean just sending a subset
of them at a time? I think the current splitting roughly allows for
that. Patch 1-4 and 5 deal with bpf/verifier code which isn't used
until patch 24. I can reorder/split up the opcodes arbitrarily.
Putting the op codes that implement file passthrough first makes
sense. The code is much easier to test when all/most are present,
since then I can just use patch 23 to mount without a daemon and run
xfs tests on them. At least initially I felt the whole stack was
useful to give the full picture.

>    - Please make the patches that can be applied to the bpf-next tree cleanly.
> For example, in patch 3, where is 18e2ec5bf453 coming from? I cannot find it in
> bpf-next and linux-next tree.
>    - Without applying it to an upstream tree cleanly, in a big set like this, I
> have no idea when bpf_prog_run() is called in patch 24 because the diff context
> is in fuse_bpf_cleanup and apparently it is not where the bpf prog is run.
>

Currently this is based off of
bf682942cd26ce9cd5e87f73ae099b383041e782 in
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
I would have rebased on top of bpf-next, except that from my
conversations at plumbers, I figured that the set up would need to
change significantly, and that effort would be wasted. My goal
including them here was to give more of a sense of what our needs are,
and be a starting point for working out what we really ought to be
using.

> Some high level comments on the set:
> - Instead of adding bpf helpers, you should consider kfunc instead. You can take
> a look at the recent HID patchset v10 or the recent nf conntrack bpf set.
>
> - Instead of expressing as packet data, using the recent dynptr is a better way
> to go for handling a mem blob.
>

I'll look into those, I remember them coming up at LPC. My current use
of packets/buffers does seem to abuse their intended meaning a bit.

> - iiuc, the idea is to allow bpf prog to optionally handle the 'struct
> file_operations' without going back to the user daemon? Have you looked at
> struct_ops which seems to be a better fit here?  If the bpf prog does not know
> how to handle an operation (or file?), it can call fuse_file_llseek (for
> example) as a kfunc to handle the request.
>

I wasn't aware of struct_ops. It looks like that may work for us
instead of making a new prog type. I'll definitely look into that.
I'll likely sign up for the bpf office hours next week.

> - The test SEC("test_trace") seems mostly a synthetic test for checking
> correctness.  Does it have a test that shows a more real life use case? or I
> have missed things in patch 26?
>

Patch 26 is pretty much all synthetic tests. A lot of them are just
ensuring that we even call in to the bpf program, and some limited
testing that changing the filters has the expected results. We
mentioned a few more concrete usecases in the LPC talk. One of those
is folder hiding. In Android we've had an issue with leaking the
existence of some apps to other apps. We needed to hide certain
directories from apps where they do have permissions to traverse the
directory. Attempting to access a folder without permissions would
result in EPERM, revealing the existence of that folder. Since the
application doesn't have permission to create arbitrary folders at
that level, we can hide it by using fuse-bpf to change the EPERM into
an ENOENT, and then filter readdir to remove disallowed entries. You
can see something like that in bpf_test_redact_readdir. We also have
some file level redaction. If an app doesn't have location
permissions, but does have file permissions, they could just read
picture metadata to get location information. We could have bpf
redirect reads that might contain location data to the daemon, while
passing through other parts.

> - Please use the skel to load the program.  It is pretty hard to read the loader
> in patch 26.

Yeah, patch 26 is not in great shape currently. I included it mostly
as something that exercises the code, and contains some example bpf
programs. Any suggestions on setting up the tests better are
appreciated.

>
> - I assume the main objective is for performance by not going back to the user
> daemon?  Do you have performance number?
>

I don't have any on hand from the current version. It's a little
tricky to know what numbers are relevant here since the numbers will
change greatly depending on what you do with it. In pure passthrough
without a bpf program, we were seeing performance pretty comparable to
the lower filesystem. Depending on how large of a bpf program we use
we were seeing pretty different slowdowns from that, though at least
some of that was from having a large switch statement.

Do you have any suggestions on what to test here? My thoughts would be
comparing lower fs performance to pure passthrough, to maybe the
example ll_passthrough in libfuse, but that doesn't really show any
bpf impact.

-Daniel