On Tue, Jun 25, 2024 at 7:41 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi Andrii, > > On 6/25/2024 11:55 AM, Andrii Nakryiko wrote: > > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> Sorry to resurrect the old thread to continue the discussion of APIs for > >> qp-trie. > >> > >> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: > >>> On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >>>> Hi, > >>>> > >> SNIP > >> > >>>> updated to allow using dynptr as map key for qp-trie. > >>>>> And that's the problem I just mentioned. > >>>>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also > >>>>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). > >>>> Sorry for misunderstanding your reply. But before switch to the kfuncl > >>>> way, could you please point me to some code or function which shows the > >>>> specialty of PTR_MAP_KEY ? > >>>> > >>>> > >>> Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The > >>> logic assumes that there is associated struct bpf_map * pointer from > >>> which we know fixed-sized key length. > >>> > >>> But getting back to the topic at hand. I vaguely remember discussion > >>> we had, but it would be good if you could summarize it again here to > >>> avoid talking past each other. What is the bpf_map_ops changes you > >>> were thinking to do? How bpf_attr will look like? How BPF-side API for > >>> lookup/delete/update will look like? And then let's go from there? > >>> Thanks! > >>> > >>> . > >> The APIs for qp-trie are composed of the followings 5 parts: > >> > >> (1) map definition for qp-trie > >> > >> The key is bpf_dynptr and map_extra specifies the max length of key. > >> > >> struct { > >> __uint(type, BPF_MAP_TYPE_QP_TRIE); > >> __type(key, struct bpf_dynptr); > > I'm not sure we need `struct bpf_dynptr` as the key type. We can just > > say that key_size has to be zero, and actual keys are variable-sized. > > > > Alternatively, we can treat key_size as "maximum key size", any > > attempt to use longer keys will be rejected. > > > > But in either case "struct bpf_dynptr" as key type seems wrong to me. > > The use of bpf_dynptr services two purposes: > (1) tell bpf subsystem that qp-trie is a map with variable-size key. > If don't use bpf_dynptr, the purpose can be accomplished by checking the > map_type. > > (2) facilitate the dump of key in bpftool when btf is available > when dump the key & value tuple through btf dump, a btf_type is needed > for the key. Because the key is variable-size, so neither using void > type (key_size =0 case) nor using the type with the maximal key size > are appropriate. But the use of bpf_dynptr can also be avoided, if we > add a special case for qp-trie when dumping its key. > both of those purposes are served just as well with something like map_flags = BPF_F_VARLEN_KEY (naming made up for illustration purposes only) > Setting key_size as zero seems weird, because qp-trie has key. I prefer > to set key_size as the maximal key size and set the btf_key_id as 0. ok, I don't think either way is better/worse than the other > >> __type(value, unsigned int); > >> __uint(map_flags, BPF_F_NO_PREALLOC); > >> __uint(map_extra, 1024); > >> } qp_trie SEC(".maps"); > >> > >> (2) bpf_attr > >> > >> Add key_sz & next_key_sz into anonymous struct to support map with > >> variable-size key. We could add value_sz if the map with variable-size > >> value is supported in the future. > >> > >> struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ > >> __u32 map_fd; > >> __aligned_u64 key; > >> union { > >> __aligned_u64 value; > >> __aligned_u64 next_key; > >> }; > >> __u64 flags; > >> __u32 key_sz; > >> __u32 next_key_sz; > >> }; > >> > > Yep, this seems inevitable. And yes, value_sz seems like a reasonable > > thing to have. It might be an option/flag whether QP-trie has > > fixed-sized or variable-sized value, I guess. But we can get there > > after all the other things are figured out. > > Do we need to add value_sz with the qp-trie patch-set or later ? I > prefer to leave it as the future work. > > future work is probably fine, but we need to design with this possibility in mind, that's all > >> (3) libbpf API > >> > >> Add bpf_map__get_next_sized_key() to high level APIs. > > All the *_sized_* names are... unfortunate, tbh. I'm not sure what's > > the right naming, but "sized" in the middle doesn't seem that. I think > > it should be a uniform suffix. Maybe something like "_varsz", > > "_varlen", or at least "_sized" (but as a suffix)?... > > I see. I have considered bpf_map_update_vs_key_elem() or similar, but I > changed it to _sized later. Because bpf_map_update_sized_elem() not only > supports variable-sized key, but also supports fixed-size key, so I > think it is a bit weird that the high level API bpf_map__update_elem() > invokes bpf_map__update_elem_varlen() in turn to update element for map > with fixed-size key. I will try to add _sized as a suffix in the formal > patch set. > > > >> LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map, > >> const void *cur_key, > >> size_t cur_key_sz, > >> void *next_key, size_t > >> *next_key_sz); > > SNIP > >> > >> > >> > >> (4) bpf_map_ops > >> > >> Update the arguments for map_get_next_key()/map_lookup_elem_sys_only(). > >> Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops. > >> > >> Updating map_update_elem()/map_delete_elem() is also fine, but it may > >> introduce too much churn and need to pass map->key_size to these APIs > >> for existing callers. > >> > > We can have a protocol that key_size and value_size might be zero for > > fixed-sized maps, in which case key/value size is not > > checked/enforced, right? > > Yes. We could pass 0 as key_size for the existing callers of > ->map_update_elem()/->map_delete_elem() > > > > I think it's much better to keep one universal interface that works > > for both fixed- and variable-sized map (especially that we can > > technically have maps where fixed-sized or variable-sized is a matter > > of choice and some map_flag value). > > I see your point. Will update > map_update_elem()/map_delete_elem()/map_lookup_elem() instead. > > > >> struct bpf_map_ops { > >> int (*map_get_next_key)(struct bpf_map *map, void *key, u32 > >> key_size, void *next_key, u32 *next_key_size); > >> void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void > >> *key, u32 key_size); > >> > >> int (*map_update_elem_sys_only)(struct bpf_map *map, void *key, > >> u32 key_size, void *value, u64 flags); > >> int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key, > >> u32 key_size); > >> }; > >> > >> (5) API for bpf program > >> > >> Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three > >> new kfuncs to support lookup/update/deletion operation on qp-trie. > >> > > hopefully those won't be qp-trie specific? Also, are you planning to > > have only key variable-sized or value as well? > > Will make these kfuncs be available for all maps with variable-size key > support. I think it is better to have both variable-sized key and value > in these kfuncs. WDYT ? yep > > > >