On Thu, Oct 13, 2022 at 9:02 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 10/14/2022 2:04 AM, Andrii Nakryiko wrote: > > On Fri, Oct 7, 2022 at 7:40 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> On 10/1/2022 5:35 AM, Andrii Nakryiko wrote: > >>> On Wed, Sep 28, 2022 at 7:11 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> SNP > >>>>> I'm trying to understand why there should be so many new concepts and > >>>>> interfaces just to allow variable-sized keys. Can you elaborate on > >>>>> that? Like why do we even need BPF_DYNPTR_TYPE_USER? Why user can't > >>>>> just pass a void * (casted to u64) pointer and size of the memory > >>>>> pointed to it, and kernel will just copy necessary amount of data into > >>>>> kvmalloc'ed temporary region? > >>>> The main reason is that map operations from syscall and bpf program use the same > >>>> ops in bpf_map_ops (e.g. map_update_elem). If only use dynptr_kern for bpf > >>>> program, then > >>>> have to define three new operations for bpf program. Even more, after defining > >>>> two different map ops for the same operation from syscall and bpf program, the > >>>> internal implementation of qp-trie still need to convert these two different > >>>> representations of variable-length key into bpf_qp_trie_key. It introduces > >>>> unnecessary conversion, so I think it may be a good idea to pass dynptr_kern to > >>>> qp-trie even for bpf syscall. > >>>> > >>>> And now in bpf_attr, for BPF_MAP_*_ELEM command, there is no space to pass an > >>>> extra key size. It seems bpf_attr can be extend, but even it is extented, it > >>>> also means in libbpf we need to provide a new API group to support operationg on > >>>> dynptr key map, because the userspace needs to pass the key size as a new argument. > >>> You are right that the current assumption of implicit key/value size > >>> doesn't work for these variable-key/value-length maps. But I think the > >>> right answer is actually to make sure that we have a map_update_elem > >>> callback variant that accepts key/value size explicitly. I still think > >>> that the syscall interface shouldn't introduce a concept of dynptr. > >>> >From user-space's point of view dynptr is just a memory pointer + > >>> associated memory size. Let's keep it simple. And yes, it will be a > >>> new libbpf API for bpf_map_lookup_elem/bpf_map_update_elem. That's > >>> fine. > >> Is your point that dynptr is too complicated for user-space and may lead to > >> confusion between dynptr in kernel space ? How about a different name or a > > No, dynptr is just an unnecessary concept for user-space, because > > fundamentally it's just a memory region, which in UAPI is represented > > by a pointer + size. So why inventing new concepts when existing ones > > are covering it? > But the problem is pointer + explicit size is not being covered by any existing > APIs and we need to add support for it. Using dnyptr is one option and directly > using pointer + explicit size is another one. dynptr is more than pointer + size (it supports various types of memory it points to, it supports offset, etc), it's more generic thing for BPF-side programmability. There is no need to expose it into user-space. All we care about here is memory region, which is pointer + size. Keep it simple. > > > >> simple definition just like bpf_lpm_trie_key ? It will make both the > >> implementation and the usage much simpler, because the implementation and the > >> user can still use the same APIs just like fixed sized map. > >> > >> Not just lookup/update/delete, we also need to define a new op for > >> get_next_key/lookup_and_delete_elem. And also need to define corresponding new > >> bpf helpers for bpf program. And you said "explict key/value size", do you mean > >> something below ? > >> > >> int (*map_update_elem)(struct bpf_map *map, void *key, u32 key_size, void > >> *value, u32 value_size, u64 flags); > > Yes, something like that. The problem is that up until now we assume > > that key_size is fixed and can be derived from map definition. We are > > trying to change that, so there needs to be a change in internal APIs. > Will need to change both the UAPIs and internal APIs. Should I add variable-size > map value into consideration this time ? I am afraid that it may be little > over-designed. Maybe I should hack a demo out firstly to check the work-load and > the complexity. I think sticking to fixed-size key/value for starters is ok, there is plenty things to figure out even without that. We can try attacking variable-sized key BPF maps (e.g., technically BPF hashmap might also support variable-sized key or value just as well) as a separate project. > > > >>> > >>>>> It also seems like you want to allow key (and maybe value as well, not > >>>>> sure) to be a custom user-defined type where some of the fields are > >>>>> struct bpf_dynptr. I think it's a big overcomplication, tbh. I'd say > >>>>> it's enough to just say that entire key has to be described by a > >>>>> single bpf_dynptr. Then we can have bpf_map_lookup_elem_dynptr(map, > >>>>> key_dynptr, flags) new helper to provide variable-sized key for > >>>>> lookup. > >>>> For qp-trie, it will only support a single dynptr as the map key. In the future > >>>> maybe other map will support map key with embedded dynptrs. Maybe Joanne can > >>>> share some vision about such use case. > >>> My point was that instead of saying that key is some fixed-size struct > >>> in which one of the fields is dynptr (and then when comparing you have > >>> to compare part of struct, then dynptr contents, then the other part > >>> of struct?), just say that entire key is represented by dynptr, > >>> implicitly (it's just a blob of bytes). That seems more > >>> straightforward. > >> I see. But I still think there is possible user case for struct with embedded > >> dynptr. For bpf map in kernel, byte blob is OK. But If it is also a blob of > >> bytes for the bpf program or userspace application, the application may need to > >> marshaling and un-marshaling between the bytes blob and a meaningful struct type > >> each time before using it. > >>> . > > I'm not sure what you mean by "blob of bytes for userspace > > application"? You mean a pointer pointing to some process' memory (not > > a kernel memory)? How is that going to work if BPF program can run and > > access such blob in any context, not just in the context of original > > user-space app that set this value? > > > > If you mean that blob needs to be interpreted as some sort of struct, > > then yes, it's easy, we have bpf_dynptr_data() and `void *` -> `struct > > my_custom_struct` casting in C. > Yes. I mean we need to cast the blob to a meaning struct before using it. If > there are one variable-length field in the struct, how would the directly > castling work as shown below ? > > struct my_custom_struct { > struct { > unsigned int len; > char *data; > } name; > unsigned int pt_code; > }; I'd imagine that you'd represent variable-sized part at the end of fixed part as flexible array of bytes: struct my_custom_struct { int pt_code; int len; char data[]; } > > > > Or did I miss your point? >