On Wed, Aug 30, 2023 at 11:29 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi Andrii, > > On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: > > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > SNIP > >>>> Yes. bpf prog will use dynptr as the map key. The bpf program will use > >>>> the same map helpers as hash map to operate on qp-trie and the verifier > >>>> will be updated to allow using dynptr as map key for qp-trie. > >>> And that's the problem I just mentioned. > >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also > >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). > >> Sorry for misunderstanding your reply. But before switch to the kfunc > >> way, could you please point me to some code or function which shows the > >> specialty of PTR_MAP_KEY ? > >> > >> > > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The > > logic assumes that there is associated struct bpf_map * pointer from > > which we know fixed-sized key length. > > Thanks for the information. Will check that. > > > > But getting back to the topic at hand. I vaguely remember discussion > > we had, but it would be good if you could summarize it again here to > > avoid talking past each other. What is the bpf_map_ops changes you > > were thinking to do? How bpf_attr will look like? How BPF-side API for > > lookup/delete/update will look like? And then let's go from there? > > Thanks! > > Sorry for the late reply. I am a bit distracted by other work this week. > > For bpf_attr, a new field 'dynkey_size' is added to support > BPF_MAP_{LOOKUP/UPDATE/DELETE}_ELEM and BPF_MAP_GET_NEXT_KEY on qp-trie > as shown below: > > struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ > __u32 map_fd; > __aligned_u64 key; > union { > __aligned_u64 value; > __aligned_u64 next_key; > }; > __u64 flags; > __u32 dynkey_size; /* input/output for > * BPF_MAP_GET_NEXT_KEY. input > * only for other commands. > */ hm.. I wonder if it would be more elegant to add `key_size` and `value_size`, and allow to specify it (optionally) even for maps that have fixed-size keys and values. Return error if expected key/value size doesn't match map definition. From libbpf side, libbpf can be smart to not set it on older kernels (or if user didn't provide this information). But for bpf_map__lookup_elem() and other higher-level APIs, we should have all this information available. > }; > > And 4 new APIs are added in libbpf to support basic operations on qp-trie: > > LIBBPF_API int bpf_map_update_dynkey_elem(int fd, const void *key, > unsigned int key_size, const void *value, __u64 flags); > LIBBPF_API int bpf_map_lookup_dynkey_elem(int fd, const void *key, > unsigned int key_size, void *value); > LIBBPF_API int bpf_map_delete_dynkey_elem(int fd, const void *key, > unsigned int key_size); > LIBBPF_API int bpf_map_get_next_dynkey(int fd, const void *key, void > *next_key, unsigned int *key_size); > > About 3 weeks again, I have used the lowest bit of key pointer in > .map_lookup_elem/.map_update_elem/.map_delete_elem to distinguish > between bpf_user_dynkey-typed key from syscall and bpf_dynptr_kern-typed > key from bpf program. The definition of bpf_user_dynkey and its > allocation method are shown below. bpf syscall uses it to allocate > variable-sized key and passes it to qp-trie. > > /* Allocate bpf_user_dynkey and its data together */ > struct bpf_user_dynkey { > unsigned int size; > void *data; > }; > > static void *bpf_new_user_dynkey(unsigned int size) > { > struct bpf_user_dynkey *dynkey; > size_t total; > > total = round_up(sizeof(*dynkey) + size, 2); > dynkey = kvmalloc(total, GFP_USER | __GFP_NOWARN); > if (!dynkey) > return ERR_PTR(-ENOMEM); > > dynkey->size = size; > dynkey->data = &dynkey[1]; > return (void *)((long)dynkey | BPF_USER_DYNKEY_MARK); > } > > > After Alexei suggested that bit hack is only OK for memory or > performance reason, I'm planning to add 2 new callbacks in bpf_map_ops > to support update/delete operations in bpf syscall as shown below, but I > have tried it yet. > > /* map is generic key/value storage optionally accessible by eBPF > programs */ > struct bpf_map_ops { > /* funcs callable from userspace (via syscall) */ > /* ...... */ > void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); a bit confused, did you mean to also have key_size as a third argument here? > long (*map_update_elem_sys_only)(struct bpf_map *map, void *key, > void *value, u64 flags); > long (*map_delete_elem_sys_only)(struct bpf_map *map, void *key); > /* ...... */ > }; > > > > > > >