On Wed, Jun 26, 2024 at 8:41 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Jun 26, 2024 at 4:59 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Tue, Jun 25, 2024 at 7:06 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Hi, > > > > > > > > Sorry to resurrect the old thread to continue the discussion of APIs for > > > > qp-trie. > > > > > > > > On 8/26/2023 2:33 AM, Andrii Nakryiko wrote: > > > > > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > > > >> Hi, > > > > >> > > > > > > > > SNIP > > > > > > > > >> updated to allow using dynptr as map key for qp-trie. > > > > >>> And that's the problem I just mentioned. > > > > >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also > > > > >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type). > > > > >> Sorry for misunderstanding your reply. But before switch to the kfuncl > > > > >> way, could you please point me to some code or function which shows the > > > > >> specialty of PTR_MAP_KEY ? > > > > >> > > > > >> > > > > > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The > > > > > logic assumes that there is associated struct bpf_map * pointer from > > > > > which we know fixed-sized key length. > > > > > > > > > > But getting back to the topic at hand. I vaguely remember discussion > > > > > we had, but it would be good if you could summarize it again here to > > > > > avoid talking past each other. What is the bpf_map_ops changes you > > > > > were thinking to do? How bpf_attr will look like? How BPF-side API for > > > > > lookup/delete/update will look like? And then let's go from there? > > > > > Thanks! > > > > > > > > > > . > > > > > > > > The APIs for qp-trie are composed of the followings 5 parts: > > > > > > > > (1) map definition for qp-trie > > > > > > > > The key is bpf_dynptr and map_extra specifies the max length of key. > > > > > > > > struct { > > > > __uint(type, BPF_MAP_TYPE_QP_TRIE); > > > > __type(key, struct bpf_dynptr); > > > > __type(value, unsigned int); > > > > __uint(map_flags, BPF_F_NO_PREALLOC); > > > > __uint(map_extra, 1024); > > > > } qp_trie SEC(".maps"); > > > > > > > > (2) bpf_attr > > > > > > > > Add key_sz & next_key_sz into anonymous struct to support map with > > > > variable-size key. We could add value_sz if the map with variable-size > > > > value is supported in the future. > > > > > > > > struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ > > > > __u32 map_fd; > > > > __aligned_u64 key; > > > > union { > > > > __aligned_u64 value; > > > > __aligned_u64 next_key; > > > > }; > > > > __u64 flags; > > > > __u32 key_sz; > > > > __u32 next_key_sz; > > > > }; > > > > > > > > (3) libbpf API > > > > > > > > Add bpf_map__get_next_sized_key() to high level APIs. > > > > > > > > LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map, > > > > const void *cur_key, > > > > size_t cur_key_sz, > > > > void *next_key, size_t > > > > *next_key_sz); > > > > > > > > Add > > > > bpf_map_update_sized_elem()/bpf_map_lookup_sized_elem()/bpf_map_delete_sized_elem()/bpf_map_get_next_sized_key() > > > > to low level APIs. > > > > These APIs have already considered the case in which map has > > > > variable-size value, so there will be no need to add other new APIs to > > > > support such case. > > > > > > > > LIBBPF_API int bpf_map_update_sized_elem(int fd, const void *key, size_t > > > > key_sz, > > > > const void *value, size_t value_sz, > > > > __u64 flags); > > > > LIBBPF_API int bpf_map_lookup_sized_elem(int fd, const void *key, size_t > > > > key_sz, > > > > void *value, size_t *value_sz, > > > > __u64 flags); > > > > LIBBPF_API int bpf_map_delete_sized_elem(int fd, const void *key, size_t > > > > key_sz, > > > > __u64 flags); > > > > LIBBPF_API int bpf_map_get_next_sized_key(int fd, > > > > const void *key, size_t key_sz, > > > > void *next_key, size_t > > > > *next_key_sz); > > > > > > I don't like this approach. > > > It looks messy to me and solving one specific case where > > > key/value is a blob of bytes. > > > In other words it's taking api to pre-BTF days when everything > > > was an opaque blob. > > > I think we need a new object dynptr-like that is composable with other types. > > > So that user can say that key is > > > struct map_key { > > > long foo; > > > dynptr_like array; > > > int bar; > > > }; > > > > > > I'm not sure whether the existing bpf_dynptr fits exactly, but it's > > > close enough. > > > Such dynptr_like object should be able to be used as a string. > > > And map should allow two such strings: > > > struct map_key { > > > dynptr_like file_name; > > > dynptr_like dir; > > > }; > > > > "bpf_byte_slice" or something like that? Or you want that memory to > > also not be just bytes and instead be yet another type? I.e., how far > > is this dynamic variably-sized concept will go? Just one level or > > more? > > Arrays of structs is imo overkill at this point. > So just one level and just bytes. yep, agreed > That will be enough to significantly improve bpftrace performance. > > > And when an update is done for such a key, map implementation will do > > extra memory allocations to create a copy, is that the idea? > > According to Hou's plan in qp-trie it will be one long binary key. > No extra allocs. I see, I missed that part. One part which made me say that this approach isn't compatible with qp-trie is that trie generally supports "find lexicographically next key" operation, but here it just won't work. But thinking about qp-trie some more, I'm not sure it actually supports lexicographically next key, so it's a moot point. > For hash map we can do similar. > Every key can be inline marshaled key without extra indirection. > Or each dynptr can map to exactly dynptr in a key. > Then extra alloc for data will be needed. > I suspect that will be slower and more complex code. > One marshaled key with all fields feels simpler. > I believe that's what Hou is proposing.