On Tue, May 2, 2023 at 6:38 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Apr 17, 2023 at 06:40:08PM -0700, Daniel Rosenberg wrote: > > Fuse-bpf provides a short circuit path for Fuse implementations that act > > as a stacked filesystem. For cases that are directly unchanged, > > operations are passed directly to the backing filesystem. Small > > adjustments can be handled by bpf prefilters or postfilters, with the > > option to fall back to userspace as needed. > > Here is my understanding of fuse-bpf design: > - bpf progs can mostly read-only access fuse_args before and after proper vfs > operation on a backing path/file/inode. > - args are unconditionally prepared for bpf prog consumption, but progs won't > be doing anything with them most of the time. > - progs unfortunately cannot do any real work. they're nothing but simple filters. > They can give 'green light' for a fuse_FOO op to be delegated to proper vfs_FOO > in backing file. The logic in this patch keeps track of backing_path/file/inode. > - in other words bpf side is "dumb", but it's telling kernel what to do with > real things like path/file/inode and the kernel is doing real work and calling vfs_*. > > This design adds non-negligible overhead to fuse when CONFIG_FUSE_BPF is set. > Comparing to trip to user space it's close to zero, but the cost of > initialize_in/out + backing + finalize is not free. > The patch 33 is especially odd. > fuse has a traditional mechanism to upcall to user space with fuse_simple_request. > The patch 33 allows bpf prog to return special return value and trigger two more > fuse_bpf_simple_request-s to user space. Not clear why. > It seems to me that the main assumption of the fuse bpf design is that bpf prog > has to stay short and simple. It cannot do much other than reading and comparing > strings with the help of dynptr. > How about we allow bpf attach to fuse_simple_request and nothing else? > All fuse ops call it anyway and cmd is already encoded in the args. > Then let bpf prog read fuse_args as-is (without converting them to bpf_fuse_args) > and avoid doing actual fuse_req to user space. > Also allow bpf prog acquire and remember path/file/inode. > The verifier is already smart enough to track that the prog is doing it safely > without leaking references and what not. > And, of course, allow bpf prog call vfs_* via kfuncs. > In other words, instead of hard coding > +#define bpf_fuse_backing(inode, io, out, \ > + initialize_in, initialize_out, \ > + backing, finalize, args...) \ > one for each fuse_ops in the kernel let bpf prog do the same but on demand. > The biggest advantage is that this patch set instead of 95% on fuse side and 5% on bpf > will become 5% addition to fuse code. All the logic will be handled purely by bpf. > Right now you're limiting it to one backing_file per fuse_file. > With bpf prog driving it the prog can keep multiple backing_files and shuffle > access to them as prog decides. > Instead of doing 'return BPF_FUSE_CONTINUE' the bpf progs will > pass 'path' to kfunc bpf_vfs_open, than stash 'struct bpf_file*', etc. > Probably will be easier to white board this idea during lsfmmbpf. > I have to admit that sounds a bit challenging, but I'm up for sitting in front of that whiteboard :) BTW, thanks Daniel (Borkmann) for sorting out the cross track sessions for FS-BFP. We have another FS only session on FUSE-BFP, but I feel there is plenty to discuss on the FUSE-bypass part, as well as on the BPF part. Same goes for BFP iterators for filesystems session. Thanks, Amir.