Re: APIs for qp-trie //Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> Sorry to resurrect the old thread to continue the discussion of APIs for
> qp-trie.
>
> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
> > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
>
> SNIP
>
> >> updated to allow using dynptr as map key for qp-trie.
> >>> And that's the problem I just mentioned.
> >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
> >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
> >> Sorry for misunderstanding your reply. But before switch to the kfuncl
> >> way, could you please point me to some code or function which shows the
> >> specialty of PTR_MAP_KEY ?
> >>
> >>
> > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
> > logic assumes that there is associated struct bpf_map * pointer from
> > which we know fixed-sized key length.
> >
> > But getting back to the topic at hand. I vaguely remember discussion
> > we had, but it would be good if you could summarize it again here to
> > avoid talking past each other. What is the bpf_map_ops changes you
> > were thinking to do? How bpf_attr will look like? How BPF-side API for
> > lookup/delete/update will look like? And then let's go from there?
> > Thanks!
> >
> > .
>
> The APIs for qp-trie are composed of the followings 5 parts:
>
> (1) map definition for qp-trie
>
> The key is bpf_dynptr and map_extra specifies the max length of key.
>
> struct {
>     __uint(type, BPF_MAP_TYPE_QP_TRIE);
>     __type(key, struct bpf_dynptr);

I'm not sure we need `struct bpf_dynptr` as the key type. We can just
say that key_size has to be zero, and actual keys are variable-sized.

Alternatively, we can treat key_size as "maximum key size", any
attempt to use longer keys will be rejected.

But in either case "struct bpf_dynptr" as key type seems wrong to me.


>     __type(value, unsigned int);
>     __uint(map_flags, BPF_F_NO_PREALLOC);
>     __uint(map_extra, 1024);
> } qp_trie SEC(".maps");
>
> (2) bpf_attr
>
> Add key_sz & next_key_sz into anonymous struct to support map with
> variable-size key. We could add value_sz if the map with variable-size
> value is supported in the future.
>
>         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
>                 __u32           map_fd;
>                 __aligned_u64   key;
>                 union {
>                         __aligned_u64 value;
>                         __aligned_u64 next_key;
>                 };
>                 __u64           flags;
>                 __u32           key_sz;
>                 __u32           next_key_sz;
>         };
>

Yep, this seems inevitable. And yes, value_sz seems like a reasonable
thing to have. It might be an option/flag whether QP-trie has
fixed-sized or variable-sized value, I guess. But we can get there
after all the other things are figured out.

> (3) libbpf API
>
> Add bpf_map__get_next_sized_key() to high level APIs.

All the *_sized_* names are... unfortunate, tbh. I'm not sure what's
the right naming, but "sized" in the middle doesn't seem that. I think
it should be a uniform suffix. Maybe something like "_varsz",
"_varlen", or at least "_sized" (but as a suffix)?...

>
> LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map,
>                                            const void *cur_key,
>                                            size_t cur_key_sz,
>                                            void *next_key, size_t
> *next_key_sz);
>
> Add
> bpf_map_update_sized_elem()/bpf_map_lookup_sized_elem()/bpf_map_delete_sized_elem()/bpf_map_get_next_sized_key()
> to low level APIs.
> These APIs have already considered the case in which map has
> variable-size value, so there will be no need to add other new APIs to
> support such case.
>
> LIBBPF_API int bpf_map_update_sized_elem(int fd, const void *key, size_t
> key_sz,
>                                          const void *value, size_t value_sz,
>                                          __u64 flags);
> LIBBPF_API int bpf_map_lookup_sized_elem(int fd, const void *key, size_t
> key_sz,
>                                          void *value, size_t *value_sz,
>                                          __u64 flags);
> LIBBPF_API int bpf_map_delete_sized_elem(int fd, const void *key, size_t
> key_sz,
>                                          __u64 flags);
> LIBBPF_API int bpf_map_get_next_sized_key(int fd,
>                                           const void *key, size_t key_sz,
>                                           void *next_key, size_t
> *next_key_sz);
>
> (4) bpf_map_ops
>
> Update the arguments for map_get_next_key()/map_lookup_elem_sys_only().
> Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops.
>
> Updating map_update_elem()/map_delete_elem() is also fine, but it may
> introduce too much churn and need to pass map->key_size to these APIs
> for existing callers.
>

We can have a protocol that key_size and value_size might be zero for
fixed-sized maps, in which case key/value size is not
checked/enforced, right?

I think it's much better to keep one universal interface that works
for both fixed- and variable-sized map (especially that we can
technically have maps where fixed-sized or variable-sized is a matter
of choice and some map_flag value).

> struct bpf_map_ops {
>         int (*map_get_next_key)(struct bpf_map *map, void *key, u32
> key_size, void *next_key, u32 *next_key_size);
>         void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void
> *key, u32 key_size);
>
>         int (*map_update_elem_sys_only)(struct bpf_map *map, void *key,
> u32 key_size, void *value, u64 flags);
>         int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key,
> u32 key_size);
> };
>
> (5) API for bpf program
>
> Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three
> new kfuncs to support lookup/update/deletion operation on qp-trie.
>

hopefully those won't be qp-trie specific? Also, are you planning to
have only key variable-sized or value as well?

>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux