Hi, On 10/1/2022 5:35 AM, Andrii Nakryiko wrote: > On Wed, Sep 28, 2022 at 7:11 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: SNP >>> I'm trying to understand why there should be so many new concepts and >>> interfaces just to allow variable-sized keys. Can you elaborate on >>> that? Like why do we even need BPF_DYNPTR_TYPE_USER? Why user can't >>> just pass a void * (casted to u64) pointer and size of the memory >>> pointed to it, and kernel will just copy necessary amount of data into >>> kvmalloc'ed temporary region? >> The main reason is that map operations from syscall and bpf program use the same >> ops in bpf_map_ops (e.g. map_update_elem). If only use dynptr_kern for bpf >> program, then >> have to define three new operations for bpf program. Even more, after defining >> two different map ops for the same operation from syscall and bpf program, the >> internal implementation of qp-trie still need to convert these two different >> representations of variable-length key into bpf_qp_trie_key. It introduces >> unnecessary conversion, so I think it may be a good idea to pass dynptr_kern to >> qp-trie even for bpf syscall. >> >> And now in bpf_attr, for BPF_MAP_*_ELEM command, there is no space to pass an >> extra key size. It seems bpf_attr can be extend, but even it is extented, it >> also means in libbpf we need to provide a new API group to support operationg on >> dynptr key map, because the userspace needs to pass the key size as a new argument. > You are right that the current assumption of implicit key/value size > doesn't work for these variable-key/value-length maps. But I think the > right answer is actually to make sure that we have a map_update_elem > callback variant that accepts key/value size explicitly. I still think > that the syscall interface shouldn't introduce a concept of dynptr. > >From user-space's point of view dynptr is just a memory pointer + > associated memory size. Let's keep it simple. And yes, it will be a > new libbpf API for bpf_map_lookup_elem/bpf_map_update_elem. That's > fine. Is your point that dynptr is too complicated for user-space and may lead to confusion between dynptr in kernel space ? How about a different name or a simple definition just like bpf_lpm_trie_key ? It will make both the implementation and the usage much simpler, because the implementation and the user can still use the same APIs just like fixed sized map. Not just lookup/update/delete, we also need to define a new op for get_next_key/lookup_and_delete_elem. And also need to define corresponding new bpf helpers for bpf program. And you said "explict key/value size", do you mean something below ? int (*map_update_elem)(struct bpf_map *map, void *key, u32 key_size, void *value, u32 value_size, u64 flags); > > >>> It also seems like you want to allow key (and maybe value as well, not >>> sure) to be a custom user-defined type where some of the fields are >>> struct bpf_dynptr. I think it's a big overcomplication, tbh. I'd say >>> it's enough to just say that entire key has to be described by a >>> single bpf_dynptr. Then we can have bpf_map_lookup_elem_dynptr(map, >>> key_dynptr, flags) new helper to provide variable-sized key for >>> lookup. >> For qp-trie, it will only support a single dynptr as the map key. In the future >> maybe other map will support map key with embedded dynptrs. Maybe Joanne can >> share some vision about such use case. > My point was that instead of saying that key is some fixed-size struct > in which one of the fields is dynptr (and then when comparing you have > to compare part of struct, then dynptr contents, then the other part > of struct?), just say that entire key is represented by dynptr, > implicitly (it's just a blob of bytes). That seems more > straightforward. I see. But I still think there is possible user case for struct with embedded dynptr. For bpf map in kernel, byte blob is OK. But If it is also a blob of bytes for the bpf program or userspace application, the application may need to marshaling and un-marshaling between the bytes blob and a meaningful struct type each time before using it. > .