Hi, On 9/29/2022 8:16 AM, Andrii Nakryiko wrote: > On Sat, Sep 24, 2022 at 6:18 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> Userspace application uses bpf syscall to lookup or update bpf map. It >> passes a pointer of fixed-size buffer to kernel to represent the map >> key. To support map with variable-length key, introduce bpf_dynptr_user >> to allow userspace to pass a pointer of bpf_dynptr_user to specify the >> address and the length of key buffer. And in order to represent dynptr >> from userspace, adding a new dynptr type: BPF_DYNPTR_TYPE_USER. Because >> BPF_DYNPTR_TYPE_USER-typed dynptr is not available from bpf program, so >> no verifier update is needed. >> >> Add dynptr_key_off in bpf_map to distinguish map with fixed-size key >> from map with variable-length. dynptr_key_off is less than zero for >> fixed-size key and can only be zero for dynptr key. >> >> For dynptr-key map, key btf type is bpf_dynptr and key size is 16, so >> use the lower 32-bits of map_extra to specify the maximum size of dynptr >> key. >> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >> --- > This is a great feature and you've put lots of high-quality work into > this! Looking forward to have qp-trie BPF map available. Apart from > your discussion with Alexie about locking and memory > allocation/reused, I have questions about this dynptr from user-space > interface. Let's discuss it in this patch to not interfere. > > I'm trying to understand why there should be so many new concepts and > interfaces just to allow variable-sized keys. Can you elaborate on > that? Like why do we even need BPF_DYNPTR_TYPE_USER? Why user can't > just pass a void * (casted to u64) pointer and size of the memory > pointed to it, and kernel will just copy necessary amount of data into > kvmalloc'ed temporary region? The main reason is that map operations from syscall and bpf program use the same ops in bpf_map_ops (e.g. map_update_elem). If only use dynptr_kern for bpf program, then have to define three new operations for bpf program. Even more, after defining two different map ops for the same operation from syscall and bpf program, the internal implementation of qp-trie still need to convert these two different representations of variable-length key into bpf_qp_trie_key. It introduces unnecessary conversion, so I think it may be a good idea to pass dynptr_kern to qp-trie even for bpf syscall. And now in bpf_attr, for BPF_MAP_*_ELEM command, there is no space to pass an extra key size. It seems bpf_attr can be extend, but even it is extented, it also means in libbpf we need to provide a new API group to support operationg on dynptr key map, because the userspace needs to pass the key size as a new argument. > > It also seems like you want to allow key (and maybe value as well, not > sure) to be a custom user-defined type where some of the fields are > struct bpf_dynptr. I think it's a big overcomplication, tbh. I'd say > it's enough to just say that entire key has to be described by a > single bpf_dynptr. Then we can have bpf_map_lookup_elem_dynptr(map, > key_dynptr, flags) new helper to provide variable-sized key for > lookup. For qp-trie, it will only support a single dynptr as the map key. In the future maybe other map will support map key with embedded dynptrs. Maybe Joanne can share some vision about such use case. > > I think it would keep it much simpler. But if I'm missing something, > it would be good to understand that. Thanks! > > >> include/linux/bpf.h | 8 +++ >> include/uapi/linux/bpf.h | 6 ++ >> kernel/bpf/map_in_map.c | 3 + >> kernel/bpf/syscall.c | 121 +++++++++++++++++++++++++++------ >> tools/include/uapi/linux/bpf.h | 6 ++ >> 5 files changed, 125 insertions(+), 19 deletions(-) >> > [...] > .