Hi, On 9/9/2023 6:34 AM, Andrii Nakryiko wrote: > On Wed, Aug 30, 2023 at 11:29 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi Andrii, >> >> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: >>> On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> SNIP >>>>>> Yes. bpf prog will use dynptr as the map key. The bpf program will use >>>>>> the same map helpers as hash map to operate on qp-trie and the verifier >>>>>> will be updated to allow using dynptr as map key for qp-trie. >>>>> And that's the problem I just mentioned. >>>>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also >>>>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). >>>> Sorry for misunderstanding your reply. But before switch to the kfunc >>>> way, could you please point me to some code or function which shows the >>>> specialty of PTR_MAP_KEY ? >>>> >>>> >>> Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The >>> logic assumes that there is associated struct bpf_map * pointer from >>> which we know fixed-sized key length. >> Thanks for the information. Will check that. >>> But getting back to the topic at hand. I vaguely remember discussion >>> we had, but it would be good if you could summarize it again here to >>> avoid talking past each other. What is the bpf_map_ops changes you >>> were thinking to do? How bpf_attr will look like? How BPF-side API for >>> lookup/delete/update will look like? And then let's go from there? >>> Thanks! >> Sorry for the late reply. I am a bit distracted by other work this week. >> >> For bpf_attr, a new field 'dynkey_size' is added to support >> BPF_MAP_{LOOKUP/UPDATE/DELETE}_ELEM and BPF_MAP_GET_NEXT_KEY on qp-trie >> as shown below: >> >> struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ >> __u32 map_fd; >> __aligned_u64 key; >> union { >> __aligned_u64 value; >> __aligned_u64 next_key; >> }; >> __u64 flags; >> __u32 dynkey_size; /* input/output for >> * BPF_MAP_GET_NEXT_KEY. input >> * only for other commands. >> */ > hm.. I wonder if it would be more elegant to add `key_size` and > `value_size`, and allow to specify it (optionally) even for maps that > have fixed-size keys and values. Return error if expected key/value > size doesn't match map definition. From libbpf side, libbpf can be > smart to not set it on older kernels (or if user didn't provide this > information). But for bpf_map__lookup_elem() and other higher-level > APIs, we should have all this information available. I am OK with the addition of key_size and value_size in bpf_attr and I will try to do that. After the addition, bpf syscall will also need to check these two size for fixed-size map, but I am a bit worried about the compatibility of libbpf and kernel for fixed-size map. There are three possible cases: 1) new libbpf and older kernel. key_size and value_size will be ignored by older kernel, because the definition of bpf_attr in older kernel is shorter. It will be OK. 2) old libbpf and new kernel. key_size and value_size will be zero for kernel, because libbpf doesn't pass these values. We can use 0 as an unspecified size, but for some map (e.g., bloom-filter) zero key_size is also valid, so do we need to introduce a feature bit to tell both key_size and value_size are valid ? 3) matched libbpf and kernel. Same problem as 2): use zero as an unspecified size or an extra flag is needed. I totally missed these higher-level APIs with both key_sz and value_sz. It seems these high-level also need updates to support qp-trie (e.g., bpf_map__get_next_key, because the size of current key and next key are different). > > >> }; >> >> And 4 new APIs are added in libbpf to support basic operations on qp-trie: >> >> LIBBPF_API int bpf_map_update_dynkey_elem(int fd, const void *key, >> unsigned int key_size, const void *value, __u64 flags); >> LIBBPF_API int bpf_map_lookup_dynkey_elem(int fd, const void *key, >> unsigned int key_size, void *value); >> LIBBPF_API int bpf_map_delete_dynkey_elem(int fd, const void *key, >> unsigned int key_size); >> LIBBPF_API int bpf_map_get_next_dynkey(int fd, const void *key, void >> *next_key, unsigned int *key_size); >> >> About 3 weeks again, I have used the lowest bit of key pointer in >> .map_lookup_elem/.map_update_elem/.map_delete_elem to distinguish >> between bpf_user_dynkey-typed key from syscall and bpf_dynptr_kern-typed >> key from bpf program. The definition of bpf_user_dynkey and its >> allocation method are shown below. bpf syscall uses it to allocate >> variable-sized key and passes it to qp-trie. >> >> /* Allocate bpf_user_dynkey and its data together */ >> struct bpf_user_dynkey { >> unsigned int size; >> void *data; >> }; >> >> static void *bpf_new_user_dynkey(unsigned int size) >> { >> struct bpf_user_dynkey *dynkey; >> size_t total; >> >> total = round_up(sizeof(*dynkey) + size, 2); >> dynkey = kvmalloc(total, GFP_USER | __GFP_NOWARN); >> if (!dynkey) >> return ERR_PTR(-ENOMEM); >> >> dynkey->size = size; >> dynkey->data = &dynkey[1]; >> return (void *)((long)dynkey | BPF_USER_DYNKEY_MARK); >> } >> >> >> After Alexei suggested that bit hack is only OK for memory or >> performance reason, I'm planning to add 2 new callbacks in bpf_map_ops >> to support update/delete operations in bpf syscall as shown below, but I >> have tried it yet. >> >> /* map is generic key/value storage optionally accessible by eBPF >> programs */ >> struct bpf_map_ops { >> /* funcs callable from userspace (via syscall) */ >> /* ...... */ >> void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); > a bit confused, did you mean to also have key_size as a third argument here? Ah, I am planning to pass bpf_user_dynkey to these newly-added APIs and bpf_user_dynke::size is the size of the key. Passing a plain pointer and key size is also fine. > >> long (*map_update_elem_sys_only)(struct bpf_map *map, void *key, >> void *value, u64 flags); >> long (*map_delete_elem_sys_only)(struct bpf_map *map, void *key); >> /* ...... */ >> }; >> >> >> >> >> >> >>