Re: [PATCH bpf-next v2 03/13] bpf: Support bpf_dynptr-typed map key in bpf syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 10/14/2022 2:04 AM, Andrii Nakryiko wrote:
> On Fri, Oct 7, 2022 at 7:40 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On 10/1/2022 5:35 AM, Andrii Nakryiko wrote:
>>> On Wed, Sep 28, 2022 at 7:11 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> SNP
>>>>> I'm trying to understand why there should be so many new concepts and
>>>>> interfaces just to allow variable-sized keys. Can you elaborate on
>>>>> that? Like why do we even need BPF_DYNPTR_TYPE_USER? Why user can't
>>>>> just pass a void * (casted to u64) pointer and size of the memory
>>>>> pointed to it, and kernel will just copy necessary amount of data into
>>>>> kvmalloc'ed temporary region?
>>>> The main reason is that map operations from syscall and bpf program use the same
>>>> ops in bpf_map_ops (e.g. map_update_elem). If only use dynptr_kern for bpf
>>>> program, then
>>>> have to define three new operations for bpf program. Even more, after defining
>>>> two different map ops for the same operation from syscall and bpf program, the
>>>> internal  implementation of qp-trie still need to convert these two different
>>>> representations of variable-length key into bpf_qp_trie_key. It introduces
>>>> unnecessary conversion, so I think it may be a good idea to pass dynptr_kern to
>>>> qp-trie even for bpf syscall.
>>>>
>>>> And now in bpf_attr, for BPF_MAP_*_ELEM command, there is no space to pass an
>>>> extra key size. It seems bpf_attr can be extend, but even it is extented, it
>>>> also means in libbpf we need to provide a new API group to support operationg on
>>>> dynptr key map, because the userspace needs to pass the key size as a new argument.
>>> You are right that the current assumption of implicit key/value size
>>> doesn't work for these variable-key/value-length maps. But I think the
>>> right answer is actually to make sure that we have a map_update_elem
>>> callback variant that accepts key/value size explicitly. I still think
>>> that the syscall interface shouldn't introduce a concept of dynptr.
>>> >From user-space's point of view dynptr is just a memory pointer +
>>> associated memory size. Let's keep it simple. And yes, it will be a
>>> new libbpf API for bpf_map_lookup_elem/bpf_map_update_elem. That's
>>> fine.
>> Is your point that dynptr is too complicated for user-space and may lead to
>> confusion between dynptr in kernel space ? How about a different name or a
> No, dynptr is just an unnecessary concept for user-space, because
> fundamentally it's just a memory region, which in UAPI is represented
> by a pointer + size. So why inventing new concepts when existing ones
> are covering it?
But the problem is pointer + explicit size is not being covered by any existing
APIs and we need to add support for it. Using dnyptr is one option and directly
using pointer + explicit size is another one.
>
>> simple definition just like bpf_lpm_trie_key ? It will make both the
>> implementation and the usage much simpler, because the implementation and the
>> user can still use the same APIs just like fixed sized map.
>>
>> Not just lookup/update/delete, we also need to define a new op for
>> get_next_key/lookup_and_delete_elem. And also need to define corresponding new
>> bpf helpers for bpf program. And you said "explict key/value size", do you mean
>> something below ?
>>
>> int (*map_update_elem)(struct bpf_map *map, void *key, u32 key_size, void
>> *value, u32 value_size, u64 flags);
> Yes, something like that. The problem is that up until now we assume
> that key_size is fixed and can be derived from map definition. We are
> trying to change that, so there needs to be a change in internal APIs.
Will need to change both the UAPIs and internal APIs. Should I add variable-size
map value into consideration this time ? I am afraid that it may be little
over-designed. Maybe I should hack a demo out firstly to check the work-load and
the complexity.
>
>>>
>>>>> It also seems like you want to allow key (and maybe value as well, not
>>>>> sure) to be a custom user-defined type where some of the fields are
>>>>> struct bpf_dynptr. I think it's a big overcomplication, tbh. I'd say
>>>>> it's enough to just say that entire key has to be described by a
>>>>> single bpf_dynptr. Then we can have bpf_map_lookup_elem_dynptr(map,
>>>>> key_dynptr, flags) new helper to provide variable-sized key for
>>>>> lookup.
>>>> For qp-trie, it will only support a single dynptr as the map key. In the future
>>>> maybe other map will support map key with embedded dynptrs. Maybe Joanne can
>>>> share some vision about such use case.
>>> My point was that instead of saying that key is some fixed-size struct
>>> in which one of the fields is dynptr (and then when comparing you have
>>> to compare part of struct, then dynptr contents, then the other part
>>> of struct?), just say that entire key is represented by dynptr,
>>> implicitly (it's just a blob of bytes). That seems more
>>> straightforward.
>> I see. But I still think there is possible user case for struct with embedded
>> dynptr. For bpf map in kernel, byte blob is OK. But If it is also a blob of
>> bytes for the bpf program or userspace application, the application may need to
>> marshaling and un-marshaling between the bytes blob and a meaningful struct type
>> each time before using it.
>>> .
> I'm not sure what you mean by "blob of bytes for userspace
> application"? You mean a pointer pointing to some process' memory (not
> a kernel memory)? How is that going to work if BPF program can run and
> access such blob in any context, not just in the context of original
> user-space app that set this value?
>
> If you mean that blob needs to be interpreted as some sort of struct,
> then yes, it's easy, we have bpf_dynptr_data() and `void *` -> `struct
> my_custom_struct` casting in C.
Yes. I mean we need to cast the blob to a meaning struct before using it. If
there are one variable-length field in the struct, how would the directly
castling work as shown below ?

struct my_custom_struct {
           struct {
               unsigned int len;
               char *data;
           } name;
           unsigned int pt_code;
};
>
> Or did I miss your point?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux