On Wed, Apr 26, 2023 at 9:18 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > Have you considered grouping this huge amount of callbacks into a > smaller set of more generic callbacks where each callback would get > enum argument specifying what sort of operation it is called for? This > has many advantages, starting from not having to deal with struct_ops > limits, ending with not needing to instantiate dozens of individual > BPF programs. > > E.g., for a lot of operations the difference between pre- and > post-filter is in having in argument as read-only and maybe having > extra out argument for post-filter. One way to unify such post/pre > filters into one callback would be to record whether in has to be > read-only or read-write and not allow to create r/w dynptr for the > former case. Pass bool or enum specifying if it's post or pre filter. > For that optional out argument, you can simulate effectively the same > by always supplying it, but making sure that out parameter is > read-only and zero-sized, for example. > > That would cut the number of callbacks in two, which I'd say still is > not great :) I think it would be better still to have even larger > groups of callbacks for whole families of operations with the same (or > "unifiable") interface (domain experts like you would need to do an > analysis here to see what makes sense to group, of course). > > We'll probably touch on that tomorrow at BPF office hours, but I > wanted to point this out beforehand, so that you have time to think > about it. > The meta info struct we pass in includes the opcode which contains whether it is a prefilter or postfilter, although I guess that may be less accessible to the verifier than a separate bool. In the v1 version, we handled all op codes in a single program, although I think we were running into some slowdowns when we had every opcode in a giant switch statement, plus we were incurring the cost of the bpf program even when we didn't need to do anything in it. The struct_op version lets us entirely skip calling the bpf for opcodes we don't need to handle. Many of the arguments we pass currently are structs. If they were all dynptrs, we could set the output related ones to empty/readonly, but that removes one of the other strengths of the struct_op setup, where we can actually label the inputs as the structs they are instead of a void* equivalent. There are definitely some cases where we could easily merge opcode callbacks, like FUSE_FSYNCDIR/FUSE_FSYNC and FUSE_OPEN/FUSE_OPENDIR. I set them up as separate since it's easy to assign the same program to both callbacks in the case where you want both to be handled the same, while maintaining flexibility to handle them separately. > +void bpf_fuse_get_rw_dynptr(struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit, u64 size, bool copy) > > not clear why size is passed from outside instead of instantiating > dynptr with buffer->size? See [0] for bpf_dynptr_adjust and > bpf_dynptr_clone that allow you to adjust buffer as necessary. > > As for the copy parameter, can you elaborate on the idea behind it? > > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=741584&state=* > We're storing these buffers as fuse_buffers initially because of the additional metadata we're carrying. Some fields have variable lengths, or are backed by const data. For instance, names. If you wanted to alter the name we use on the lower filesystem, you cannot change it directly since it's being backed by the dentry name. If you wanted to adjust something, like perhaps adding an extension, you would pass bpf_fuse_get_rw_dynptr the size you'd want for the new buffer, and copy=true to get the preexisting data. Fuse_buffer tracks that data was allocated so Fuse can clean up after the call. Additionally, say you wanted to trim half the data returned by an xattr for some reason. You would give it a size less than the buffer size to inform fuse that it should ignore the second half of the data. That part could be handled by bpf_dynptr_adjust if we didn't also need to handle the allocation case. Say you wanted to have the lower file name be the hash of the one you created. In that case, you could get bpf_fuse_get_ro_dynptr to get access to compute the hash, and then bpf_fuse_get_rw_dynptr to get a buffer to write the hash to. Since the data is not directly related to the original data, there would be no benefit to getting a copy. I initially intended for bpf_fuse_get_ro_dynptr/bpf_fuse_get_rw_dynptr to be called at most once for each field, but that may be too restrictive. At the moment, if you make two calls that require reallocating, any pointers to the old buffer would be invalid. This is not the case for the original name, as we aren't touching the original source. There are two possible approaches here. I could either refcount the buffer and have a put kfunc, or I could invalidate old dynpointers when bpf_fuse_get_rw_dynptr is called, similar to what skb/xdp do. I'm leaning towards the latter to disallow having many allocations active at once by calling bpf_fuse_get_rw_dynptr for increasing sizes, though I could also just disallow reallocating a buffer that already was reallocated. The new dynptr helpers are pretty exciting since they'll make it much easier to deal with chunks of data, which we may end up doing in read/write filters. I haven't fully set those up since I was waiting to see what the dynptr helpers ended up looking like. > > +void bpf_fuse_get_ro_dynptr(const struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit) > > these kfuncs probably should be more consistently named as > bpf_dynptr_from_fuse_buffer_{ro,rw}() ? > Yeah, that fits in much better with the skb/xdp functions. > > + > > +uint32_t bpf_fuse_return_len(struct fuse_buffer *buffer) > > +{ > > + return buffer->size; > > you should be able to get this with bpf_dynptr_size() (once you create > it from fuse_buffer). > Yes, this might be unnecessary. I added it while testing kfuncs, and had intended to use it with a fuse_buffer strncmp before I saw that there's now a bpf_strncmp :) I had tried using it with bpf_dynptr_slice, but that requires a known constant at verification time, which may make using it in real cases a bit difficult... bpf_strncmp also has some restrictions around the second string being a fixed map, or something like that. > > you probably should be fine with just using bpf_tracing_func_proto as is > > > + .is_valid_access = bpf_fuse_is_valid_access, > > similarly, why custom no-op callback? > Those are largely carried over from iterations when I was less sure what I would need. A lot of the work I was doing in the v1 code is handled by default with the struct_op setup now, or is otherwise unnecessary. This area in particular needs a lot of cleanup. > > +static int bpf_fuse_init(struct btf *btf) > > +{ > > + s32 type_id; > > + > > + type_id = btf_find_by_name_kind(btf, "fuse_buffer", BTF_KIND_STRUCT); > > + if (type_id < 0) > > + return -EINVAL; > > + fuse_buffer_struct_type = btf_type_by_id(btf, type_id); > > + > > see BTF_ID and BTF_ID_LIST uses for how to get ID for your custom > well-known type > Thanks, I'll look into those.