Hi Andrii, On 6/25/2024 11:55 AM, Andrii Nakryiko wrote: > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi, >> >> Sorry to resurrect the old thread to continue the discussion of APIs for >> qp-trie. >> >> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: >>> On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>> Hi, >>>> >> SNIP >> >>>> updated to allow using dynptr as map key for qp-trie. >>>>> And that's the problem I just mentioned. >>>>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also >>>>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). >>>> Sorry for misunderstanding your reply. But before switch to the kfuncl >>>> way, could you please point me to some code or function which shows the >>>> specialty of PTR_MAP_KEY ? >>>> >>>> >>> Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The >>> logic assumes that there is associated struct bpf_map * pointer from >>> which we know fixed-sized key length. >>> >>> But getting back to the topic at hand. I vaguely remember discussion >>> we had, but it would be good if you could summarize it again here to >>> avoid talking past each other. What is the bpf_map_ops changes you >>> were thinking to do? How bpf_attr will look like? How BPF-side API for >>> lookup/delete/update will look like? And then let's go from there? >>> Thanks! >>> >>> . >> The APIs for qp-trie are composed of the followings 5 parts: >> >> (1) map definition for qp-trie >> >> The key is bpf_dynptr and map_extra specifies the max length of key. >> >> struct { >> __uint(type, BPF_MAP_TYPE_QP_TRIE); >> __type(key, struct bpf_dynptr); > I'm not sure we need `struct bpf_dynptr` as the key type. We can just > say that key_size has to be zero, and actual keys are variable-sized. > > Alternatively, we can treat key_size as "maximum key size", any > attempt to use longer keys will be rejected. > > But in either case "struct bpf_dynptr" as key type seems wrong to me. The use of bpf_dynptr services two purposes: (1) tell bpf subsystem that qp-trie is a map with variable-size key. If don't use bpf_dynptr, the purpose can be accomplished by checking the map_type. (2) facilitate the dump of key in bpftool when btf is available when dump the key & value tuple through btf dump, a btf_type is needed for the key. Because the key is variable-size, so neither using void type (key_size =0 case) nor using the type with the maximal key size are appropriate. But the use of bpf_dynptr can also be avoided, if we add a special case for qp-trie when dumping its key. Setting key_size as zero seems weird, because qp-trie has key. I prefer to set key_size as the maximal key size and set the btf_key_id as 0. >> __type(value, unsigned int); >> __uint(map_flags, BPF_F_NO_PREALLOC); >> __uint(map_extra, 1024); >> } qp_trie SEC(".maps"); >> >> (2) bpf_attr >> >> Add key_sz & next_key_sz into anonymous struct to support map with >> variable-size key. We could add value_sz if the map with variable-size >> value is supported in the future. >> >> struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ >> __u32 map_fd; >> __aligned_u64 key; >> union { >> __aligned_u64 value; >> __aligned_u64 next_key; >> }; >> __u64 flags; >> __u32 key_sz; >> __u32 next_key_sz; >> }; >> > Yep, this seems inevitable. And yes, value_sz seems like a reasonable > thing to have. It might be an option/flag whether QP-trie has > fixed-sized or variable-sized value, I guess. But we can get there > after all the other things are figured out. Do we need to add value_sz with the qp-trie patch-set or later ? I prefer to leave it as the future work. > >> (3) libbpf API >> >> Add bpf_map__get_next_sized_key() to high level APIs. > All the *_sized_* names are... unfortunate, tbh. I'm not sure what's > the right naming, but "sized" in the middle doesn't seem that. I think > it should be a uniform suffix. Maybe something like "_varsz", > "_varlen", or at least "_sized" (but as a suffix)?... I see. I have considered bpf_map_update_vs_key_elem() or similar, but I changed it to _sized later. Because bpf_map_update_sized_elem() not only supports variable-sized key, but also supports fixed-size key, so I think it is a bit weird that the high level API bpf_map__update_elem() invokes bpf_map__update_elem_varlen() in turn to update element for map with fixed-size key. I will try to add _sized as a suffix in the formal patch set. > >> LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map, >> const void *cur_key, >> size_t cur_key_sz, >> void *next_key, size_t >> *next_key_sz); SNIP >> >> >> >> (4) bpf_map_ops >> >> Update the arguments for map_get_next_key()/map_lookup_elem_sys_only(). >> Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops. >> >> Updating map_update_elem()/map_delete_elem() is also fine, but it may >> introduce too much churn and need to pass map->key_size to these APIs >> for existing callers. >> > We can have a protocol that key_size and value_size might be zero for > fixed-sized maps, in which case key/value size is not > checked/enforced, right? Yes. We could pass 0 as key_size for the existing callers of ->map_update_elem()/->map_delete_elem() > > I think it's much better to keep one universal interface that works > for both fixed- and variable-sized map (especially that we can > technically have maps where fixed-sized or variable-sized is a matter > of choice and some map_flag value). I see your point. Will update map_update_elem()/map_delete_elem()/map_lookup_elem() instead. > >> struct bpf_map_ops { >> int (*map_get_next_key)(struct bpf_map *map, void *key, u32 >> key_size, void *next_key, u32 *next_key_size); >> void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void >> *key, u32 key_size); >> >> int (*map_update_elem_sys_only)(struct bpf_map *map, void *key, >> u32 key_size, void *value, u64 flags); >> int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key, >> u32 key_size); >> }; >> >> (5) API for bpf program >> >> Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three >> new kfuncs to support lookup/update/deletion operation on qp-trie. >> > hopefully those won't be qp-trie specific? Also, are you planning to > have only key variable-sized or value as well? Will make these kfuncs be available for all maps with variable-size key support. I think it is better to have both variable-sized key and value in these kfuncs. WDYT ? >