On Tue, May 2, 2023 at 6:53 PM Daniel Rosenberg <drosen@xxxxxxxxxx> wrote: > > On Wed, Apr 26, 2023 at 9:18 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > Have you considered grouping this huge amount of callbacks into a > > smaller set of more generic callbacks where each callback would get > > enum argument specifying what sort of operation it is called for? This > > has many advantages, starting from not having to deal with struct_ops > > limits, ending with not needing to instantiate dozens of individual > > BPF programs. > > > > E.g., for a lot of operations the difference between pre- and > > post-filter is in having in argument as read-only and maybe having > > extra out argument for post-filter. One way to unify such post/pre > > filters into one callback would be to record whether in has to be > > read-only or read-write and not allow to create r/w dynptr for the > > former case. Pass bool or enum specifying if it's post or pre filter. > > For that optional out argument, you can simulate effectively the same > > by always supplying it, but making sure that out parameter is > > read-only and zero-sized, for example. > > > > That would cut the number of callbacks in two, which I'd say still is > > not great :) I think it would be better still to have even larger > > groups of callbacks for whole families of operations with the same (or > > "unifiable") interface (domain experts like you would need to do an > > analysis here to see what makes sense to group, of course). > > > > We'll probably touch on that tomorrow at BPF office hours, but I > > wanted to point this out beforehand, so that you have time to think > > about it. > > > > The meta info struct we pass in includes the opcode which contains > whether it is a prefilter or postfilter, although I guess that may be > less accessible to the verifier than a separate bool. In the v1 > version, we handled all op codes in a single program, although I think > we were running into some slowdowns when we had every opcode in a > giant switch statement, plus we were incurring the cost of the bpf > program even when we didn't need to do anything in it. The struct_op > version lets us entirely skip calling the bpf for opcodes we don't > need to handle. > > Many of the arguments we pass currently are structs. If they were all > dynptrs, we could set the output related ones to empty/readonly, but > that removes one of the other strengths of the struct_op setup, where > we can actually label the inputs as the structs they are instead of a > void* equivalent. There are definitely some cases where we could > easily merge opcode callbacks, like FUSE_FSYNCDIR/FUSE_FSYNC and > FUSE_OPEN/FUSE_OPENDIR. I set them up as separate since it's easy to > assign the same program to both callbacks in the case where you want > both to be handled the same, while maintaining flexibility to handle > them separately. If combining hooks doesn't bring any value and simplification, I think it's fine to keep it as is. I was mostly probing if there is an equally convenient, but more succinct API that could be exposed through struct_ops. If there is none, then it's fine. > > > +void bpf_fuse_get_rw_dynptr(struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit, u64 size, bool copy) > > > > not clear why size is passed from outside instead of instantiating > > dynptr with buffer->size? See [0] for bpf_dynptr_adjust and > > bpf_dynptr_clone that allow you to adjust buffer as necessary. > > > > As for the copy parameter, can you elaborate on the idea behind it? > > > > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=741584&state=* > > > > We're storing these buffers as fuse_buffers initially because of the > additional metadata we're carrying. Some fields have variable lengths, > or are backed by const data. For instance, names. If you wanted to > alter the name we use on the lower filesystem, you cannot change it > directly since it's being backed by the dentry name. If you wanted to > adjust something, like perhaps adding an extension, you would pass > bpf_fuse_get_rw_dynptr the size you'd want for the new buffer, and > copy=true to get the preexisting data. Fuse_buffer tracks that data > was allocated so Fuse can clean up after the call. Additionally, say > you wanted to trim half the data returned by an xattr for some reason. > You would give it a size less than the buffer size to inform fuse that > it should ignore the second half of the data. That part could be > handled by bpf_dynptr_adjust if we didn't also need to handle the > allocation case. Interesting point about allocations and needing to realloc names. But I wonder if it makes more sense to split the copy/reallocation part and do it with separate kfunc. And leave dynptr only as means to work with that data. So you'd do something like below for read/write case: bpf_fuse_buf_clone(&buffer, new_size); bpf_fuse_dynptr_from_buf_rw(&buffer, &dynptr); But would skip bpf_fuse_buf_clone() if you only ever read: bpf_fuse_dynptr_from_buf_ro(&buffer, &dynptr); If fuse_buffer was never cloned/realloced, then bpf_fuse_dynptr_from_buf_rw() should just fail and return invalid dynptr. > Say you wanted to have the lower file name be the hash of the one you > created. In that case, you could get bpf_fuse_get_ro_dynptr to get > access to compute the hash, and then bpf_fuse_get_rw_dynptr to get a > buffer to write the hash to. Since the data is not directly related to > the original data, there would be no benefit to getting a copy. > > I initially intended for bpf_fuse_get_ro_dynptr/bpf_fuse_get_rw_dynptr > to be called at most once for each field, but that may be too > restrictive. At the moment, if you make two calls that require > reallocating, any pointers to the old buffer would be invalid. This is > not the case for the original name, as we aren't touching the original > source. There are two possible approaches here. I could either > refcount the buffer and have a put kfunc, or I could invalidate old > dynpointers when bpf_fuse_get_rw_dynptr is called, similar to what > skb/xdp do. I'm leaning towards the latter to disallow having many > allocations active at once by calling bpf_fuse_get_rw_dynptr for > increasing sizes, though I could also just disallow reallocating a > buffer that already was reallocated. Yes, invalidating dynptrs sounds like a way to go. But I think instead of bundling all that into dynptr constructor for fuse_buffer, it's better to have a separate kfunc that would be doing realloc/cloning *and* invalidating. Other than that, neither from_buf_rw nor from_buf_ro should be doing invalidation, because they can't cause realloc. WDYT? > > The new dynptr helpers are pretty exciting since they'll make it much > easier to deal with chunks of data, which we may end up doing in > read/write filters. I haven't fully set those up since I was waiting > to see what the dynptr helpers ended up looking like. > Great, let us know how it goes in practice to start using them. > > > > +void bpf_fuse_get_ro_dynptr(const struct fuse_buffer *buffer, struct bpf_dynptr_kern *dynptr__uninit) > > > > these kfuncs probably should be more consistently named as > > bpf_dynptr_from_fuse_buffer_{ro,rw}() ? > > > Yeah, that fits in much better with the skb/xdp functions. great > > > > + > > > +uint32_t bpf_fuse_return_len(struct fuse_buffer *buffer) > > > +{ > > > + return buffer->size; > > > > you should be able to get this with bpf_dynptr_size() (once you create > > it from fuse_buffer). > > > > Yes, this might be unnecessary. I added it while testing kfuncs, and > had intended to use it with a fuse_buffer strncmp before I saw that > there's now a bpf_strncmp :) I had tried using it with > bpf_dynptr_slice, but that requires a known constant at verification > time, which may make using it in real cases a bit difficult... > bpf_strncmp also has some restrictions around the second string being > a fixed map, or something like that. right, we might need a more flexible strncmp version working with two dynptrs and not assuming a fixed string. We didn't have dynptr abstraction for working with variable-sized memory when we were adding bpf_strncmp. > > > > > you probably should be fine with just using bpf_tracing_func_proto as is > > > > > + .is_valid_access = bpf_fuse_is_valid_access, > > > > similarly, why custom no-op callback? > > > > Those are largely carried over from iterations when I was less sure > what I would need. A lot of the work I was doing in the v1 code is > handled by default with the struct_op setup now, or is otherwise > unnecessary. This area in particular needs a lot of cleanup. > ok > > > +static int bpf_fuse_init(struct btf *btf) > > > +{ > > > + s32 type_id; > > > + > > > + type_id = btf_find_by_name_kind(btf, "fuse_buffer", BTF_KIND_STRUCT); > > > + if (type_id < 0) > > > + return -EINVAL; > > > + fuse_buffer_struct_type = btf_type_by_id(btf, type_id); > > > + > > > > see BTF_ID and BTF_ID_LIST uses for how to get ID for your custom > > well-known type > > > Thanks, I'll look into those.