Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 30, 2023 at 11:29 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi Andrii,
>
> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
> > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> SNIP
> >>>> Yes. bpf prog will use dynptr as the map key. The bpf program will use
> >>>> the same map helpers as hash map to operate on qp-trie and the verifier
> >>>> will be updated to allow using dynptr as map key for qp-trie.
> >>> And that's the problem I just mentioned.
> >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
> >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
> >> Sorry for misunderstanding your reply. But before switch to the kfunc
> >> way, could you please point me to some code or function which shows the
> >> specialty of PTR_MAP_KEY ?
> >>
> >>
> > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
> > logic assumes that there is associated struct bpf_map * pointer from
> > which we know fixed-sized key length.
>
> Thanks for the information. Will check that.
> >
> > But getting back to the topic at hand. I vaguely remember discussion
> > we had, but it would be good if you could summarize it again here to
> > avoid talking past each other. What is the bpf_map_ops changes you
> > were thinking to do? How bpf_attr will look like? How BPF-side API for
> > lookup/delete/update will look like? And then let's go from there?
> > Thanks!
>
> Sorry for the late reply. I am a bit distracted by other work this week.
>
> For bpf_attr, a new field 'dynkey_size' is added to support
> BPF_MAP_{LOOKUP/UPDATE/DELETE}_ELEM and BPF_MAP_GET_NEXT_KEY on qp-trie
> as shown below:
>
> struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
>         __u32           map_fd;
>         __aligned_u64   key;
>         union {
>                 __aligned_u64 value;
>                 __aligned_u64 next_key;
>         };
>         __u64           flags;
>         __u32           dynkey_size;    /* input/output for
>                                          * BPF_MAP_GET_NEXT_KEY. input
>                                          * only for other commands.
>                                          */

hm.. I wonder if it would be more elegant to add `key_size` and
`value_size`, and allow to specify it (optionally) even for maps that
have fixed-size keys and values. Return error if expected key/value
size doesn't match map definition. From libbpf side, libbpf can be
smart to not set it on older kernels (or if user didn't provide this
information). But for bpf_map__lookup_elem() and other higher-level
APIs, we should have all this information available.


> };
>
> And 4 new APIs are added in libbpf to support basic operations on qp-trie:
>
> LIBBPF_API int bpf_map_update_dynkey_elem(int fd, const void *key,
> unsigned int key_size, const void *value, __u64 flags);
> LIBBPF_API int bpf_map_lookup_dynkey_elem(int fd, const void *key,
> unsigned int key_size, void *value);
> LIBBPF_API int bpf_map_delete_dynkey_elem(int fd, const void *key,
> unsigned int key_size);
> LIBBPF_API int bpf_map_get_next_dynkey(int fd, const void *key, void
> *next_key, unsigned int *key_size);
>
> About 3 weeks again, I have used the lowest bit of key pointer in
> .map_lookup_elem/.map_update_elem/.map_delete_elem to distinguish
> between bpf_user_dynkey-typed key from syscall and bpf_dynptr_kern-typed
> key from bpf program. The definition of bpf_user_dynkey and its
> allocation method are shown below. bpf syscall uses it to allocate
> variable-sized key and passes it to qp-trie.
>
> /* Allocate bpf_user_dynkey and its data together */
> struct bpf_user_dynkey {
>         unsigned int size;
>         void *data;
> };
>
> static void *bpf_new_user_dynkey(unsigned int size)
> {
>         struct bpf_user_dynkey *dynkey;
>         size_t total;
>
>         total = round_up(sizeof(*dynkey) + size, 2);
>         dynkey = kvmalloc(total, GFP_USER | __GFP_NOWARN);
>         if (!dynkey)
>                 return ERR_PTR(-ENOMEM);
>
>         dynkey->size = size;
>         dynkey->data = &dynkey[1];
>         return (void *)((long)dynkey | BPF_USER_DYNKEY_MARK);
> }
>
>
> After Alexei suggested that bit hack is only OK for memory or
> performance reason, I'm planning to add 2 new callbacks in bpf_map_ops
> to support update/delete operations in bpf syscall as shown below, but I
> have tried it yet.
>
> /* map is generic key/value storage optionally accessible by eBPF
> programs */
> struct bpf_map_ops {
>         /* funcs callable from userspace (via syscall) */
>         /* ...... */
>         void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key);

a bit confused, did you mean to also have key_size as a third argument here?

>         long (*map_update_elem_sys_only)(struct bpf_map *map, void *key,
> void *value, u64 flags);
>         long (*map_delete_elem_sys_only)(struct bpf_map *map, void *key);
>         /* ...... */
> };
>
>
>
>
>
>
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux