Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE

Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> · Wed, 17 May 2023 15:19:28 +0800

On 2023/5/17 00:05, Gao Xiang wrote:
Hi Amir,

On 2023/5/17 23:51, Amir Goldstein wrote:
On Wed, May 17, 2023 at 5:50 AM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote:

On 2023/5/2 17:07, Daniel Rosenberg wrote:
On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:

The security model needs to be thought about and documented.  Think
about this: the fuse server now delegates operations it would itself
perform to the passthrough code in fuse.  The permissions that would
have been checked in the context of the fuse server are now checked in
the context of the task performing the operation.  The server may be
able to bypass seccomp restrictions.  Files that are open on the
backing filesystem are now hidden (e.g. lsof won't find these), which
allows the server to obfuscate accesses to backing files.  Etc.

These are not particularly worrying if the server is privileged, but
fuse comes with the history of supporting unprivileged servers, so we
should look at supporting passthrough with unprivileged servers as
well.

This is on my todo list. My current plan is to grab the creds that the
daemon uses to respond to FUSE_INIT. That should keep behavior fairly
similar. I'm not sure if there are cases where the fuse server is
operating under multiple contexts.
I don't currently have a plan for exposing open files via lsof. Every
such file should relate to one that will show up though. I haven't dug
into how that's set up, but I'm open to suggestions.

My other generic comment is that you should add justification for
doing this in the first place.  I guess it's mainly performance.  So
how performance can be won in real life cases?   It would also be good
to measure the contribution of individual ops to that win.   Is there
another reason for this besides performance?

Thanks,
Miklos

Our main concern with it is performance. We have some preliminary
numbers looking at the pure passthrough case. We've been testing using
a ramdrive on a somewhat slow machine, as that should highlight
differences more. We ran fio for sequential reads, and random
read/write. For sequential reads, we were seeing libfuse's
passthrough_hp take about a 50% hit, with fuse-bpf not being
detectably slower. For random read/write, we were seeing a roughly 90%
drop in performance from passthrough_hp, while fuse-bpf has about a 7%
drop in read and write speed. When we use a bpf that traces every
opcode, that performance hit increases to a roughly 1% drop in
sequential read performance, and a 20% drop in both read and write
performance for random read/write. We plan to make more complex bpf
examples, with fuse daemon equivalents to compare against.

We have not looked closely at the impact of individual opcodes yet.

There's also a potential ease of use for fuse-bpf. If you're
implementing a fuse daemon that is largely mirroring a backing
filesystem, you only need to write code for the differences in
behavior. For instance, say you want to remove image metadata like
location. You could give bpf information on what range of data is
metadata, and zero out that section without having to handle any other
operations.

A bit out of topic (although I'm not quite look into FUSE BPF internals)
After roughly listening to this topic in FS track last week, I'm not
quite sure (at least in the long term) if it might be better if
ebpf-related filter/redirect stuffs could be landed in vfs or in a
somewhat stackable fs so that we could redirect/filter any sub-fstree
in principle?    It's just an open question and I have no real tendency
of this but do we really need a BPF-filter functionality for each
individual fs?

I think that is a valid question, but the answer is that even if it makes sense,
doing something like this in vfs would be a much bigger project with larger
consequences on performance and security and whatnot, so even if
(and a very big if) this ever happens, using FUSE-BPF as a playground for
this sort of stuff would be a good idea.

My current observation is that the total Fuse-BPF LoC is already beyond the

                         ^ sorry I double-checked now I was wrong, forget about it.

whole FUSE itself.  In addition, it almost hooks all fs operations which
impacts something to me.

This reminds me of union mounts - it made sense to have union mount
functionality in vfs, but after a long winding road, a stacked fs (overlayfs)
turned out to be a much more practical solution.

Yeah, I agree.  So it was just a pure hint on my side.

It sounds much like
https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers

Nice reference.
I must admit that I found it hard to understand what Windows filter drivers
can do compared to FUSE-BPF design.
It'd be nice to get some comparison from what is planned for FUSE-BPF.

At least some investigation/analysis first might be better in the long
term development.

Interesting to note that there is a "legacy" Windows filter driver API,
so Windows didn't get everything right for the first API - that is especially
interesting to look at as repeating other people's mistakes would be a shame.

I'm not familiar with that details as well, yet I saw that they have a
filesystem filter subsystem, so I mentioned it here.

Thanks,
Gao Xiang

Thanks,
Amir.