On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > Sorry to resurrect the old thread to continue the discussion of APIs for > qp-trie. > > On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: > > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Hi, > >> > > SNIP > > >> updated to allow using dynptr as map key for qp-trie. > >>> And that's the problem I just mentioned. > >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also > >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). > >> Sorry for misunderstanding your reply. But before switch to the kfuncl > >> way, could you please point me to some code or function which shows the > >> specialty of PTR_MAP_KEY ? > >> > >> > > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The > > logic assumes that there is associated struct bpf_map * pointer from > > which we know fixed-sized key length. > > > > But getting back to the topic at hand. I vaguely remember discussion > > we had, but it would be good if you could summarize it again here to > > avoid talking past each other. What is the bpf_map_ops changes you > > were thinking to do? How bpf_attr will look like? How BPF-side API for > > lookup/delete/update will look like? And then let's go from there? > > Thanks! > > > > . > > The APIs for qp-trie are composed of the followings 5 parts: > > (1) map definition for qp-trie > > The key is bpf_dynptr and map_extra specifies the max length of key. > > struct { > __uint(type, BPF_MAP_TYPE_QP_TRIE); > __type(key, struct bpf_dynptr); I'm not sure we need `struct bpf_dynptr` as the key type. We can just say that key_size has to be zero, and actual keys are variable-sized. Alternatively, we can treat key_size as "maximum key size", any attempt to use longer keys will be rejected. But in either case "struct bpf_dynptr" as key type seems wrong to me. > __type(value, unsigned int); > __uint(map_flags, BPF_F_NO_PREALLOC); > __uint(map_extra, 1024); > } qp_trie SEC(".maps"); > > (2) bpf_attr > > Add key_sz & next_key_sz into anonymous struct to support map with > variable-size key. We could add value_sz if the map with variable-size > value is supported in the future. > > struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ > __u32 map_fd; > __aligned_u64 key; > union { > __aligned_u64 value; > __aligned_u64 next_key; > }; > __u64 flags; > __u32 key_sz; > __u32 next_key_sz; > }; > Yep, this seems inevitable. And yes, value_sz seems like a reasonable thing to have. It might be an option/flag whether QP-trie has fixed-sized or variable-sized value, I guess. But we can get there after all the other things are figured out. > (3) libbpf API > > Add bpf_map__get_next_sized_key() to high level APIs. All the *_sized_* names are... unfortunate, tbh. I'm not sure what's the right naming, but "sized" in the middle doesn't seem that. I think it should be a uniform suffix. Maybe something like "_varsz", "_varlen", or at least "_sized" (but as a suffix)?... > > LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map, > const void *cur_key, > size_t cur_key_sz, > void *next_key, size_t > *next_key_sz); > > Add > bpf_map_update_sized_elem()/bpf_map_lookup_sized_elem()/bpf_map_delete_sized_elem()/bpf_map_get_next_sized_key() > to low level APIs. > These APIs have already considered the case in which map has > variable-size value, so there will be no need to add other new APIs to > support such case. > > LIBBPF_API int bpf_map_update_sized_elem(int fd, const void *key, size_t > key_sz, > const void *value, size_t value_sz, > __u64 flags); > LIBBPF_API int bpf_map_lookup_sized_elem(int fd, const void *key, size_t > key_sz, > void *value, size_t *value_sz, > __u64 flags); > LIBBPF_API int bpf_map_delete_sized_elem(int fd, const void *key, size_t > key_sz, > __u64 flags); > LIBBPF_API int bpf_map_get_next_sized_key(int fd, > const void *key, size_t key_sz, > void *next_key, size_t > *next_key_sz); > > (4) bpf_map_ops > > Update the arguments for map_get_next_key()/map_lookup_elem_sys_only(). > Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops. > > Updating map_update_elem()/map_delete_elem() is also fine, but it may > introduce too much churn and need to pass map->key_size to these APIs > for existing callers. > We can have a protocol that key_size and value_size might be zero for fixed-sized maps, in which case key/value size is not checked/enforced, right? I think it's much better to keep one universal interface that works for both fixed- and variable-sized map (especially that we can technically have maps where fixed-sized or variable-sized is a matter of choice and some map_flag value). > struct bpf_map_ops { > int (*map_get_next_key)(struct bpf_map *map, void *key, u32 > key_size, void *next_key, u32 *next_key_size); > void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void > *key, u32 key_size); > > int (*map_update_elem_sys_only)(struct bpf_map *map, void *key, > u32 key_size, void *value, u64 flags); > int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key, > u32 key_size); > }; > > (5) API for bpf program > > Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three > new kfuncs to support lookup/update/deletion operation on qp-trie. > hopefully those won't be qp-trie specific? Also, are you planning to have only key variable-sized or value as well? >