Re: APIs for qp-trie //Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 26, 2024 at 4:59 PM Andrii Nakryiko
<andrii.nakryiko@xxxxxxxxx> wrote:
>
> On Tue, Jun 25, 2024 at 7:06 PM Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> >
> > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > Sorry to resurrect the old thread to continue the discussion of APIs for
> > > qp-trie.
> > >
> > > On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
> > > > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > > >> Hi,
> > > >>
> > >
> > > SNIP
> > >
> > > >> updated to allow using dynptr as map key for qp-trie.
> > > >>> And that's the problem I just mentioned.
> > > >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
> > > >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
> > > >> Sorry for misunderstanding your reply. But before switch to the kfuncl
> > > >> way, could you please point me to some code or function which shows the
> > > >> specialty of PTR_MAP_KEY ?
> > > >>
> > > >>
> > > > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
> > > > logic assumes that there is associated struct bpf_map * pointer from
> > > > which we know fixed-sized key length.
> > > >
> > > > But getting back to the topic at hand. I vaguely remember discussion
> > > > we had, but it would be good if you could summarize it again here to
> > > > avoid talking past each other. What is the bpf_map_ops changes you
> > > > were thinking to do? How bpf_attr will look like? How BPF-side API for
> > > > lookup/delete/update will look like? And then let's go from there?
> > > > Thanks!
> > > >
> > > > .
> > >
> > > The APIs for qp-trie are composed of the followings 5 parts:
> > >
> > > (1) map definition for qp-trie
> > >
> > > The key is bpf_dynptr and map_extra specifies the max length of key.
> > >
> > > struct {
> > >     __uint(type, BPF_MAP_TYPE_QP_TRIE);
> > >     __type(key, struct bpf_dynptr);
> > >     __type(value, unsigned int);
> > >     __uint(map_flags, BPF_F_NO_PREALLOC);
> > >     __uint(map_extra, 1024);
> > > } qp_trie SEC(".maps");
> > >
> > > (2) bpf_attr
> > >
> > > Add key_sz & next_key_sz into anonymous struct to support map with
> > > variable-size key. We could add value_sz if the map with variable-size
> > > value is supported in the future.
> > >
> > >         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> > >                 __u32           map_fd;
> > >                 __aligned_u64   key;
> > >                 union {
> > >                         __aligned_u64 value;
> > >                         __aligned_u64 next_key;
> > >                 };
> > >                 __u64           flags;
> > >                 __u32           key_sz;
> > >                 __u32           next_key_sz;
> > >         };
> > >
> > > (3) libbpf API
> > >
> > > Add bpf_map__get_next_sized_key() to high level APIs.
> > >
> > > LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map,
> > >                                            const void *cur_key,
> > >                                            size_t cur_key_sz,
> > >                                            void *next_key, size_t
> > > *next_key_sz);
> > >
> > > Add
> > > bpf_map_update_sized_elem()/bpf_map_lookup_sized_elem()/bpf_map_delete_sized_elem()/bpf_map_get_next_sized_key()
> > > to low level APIs.
> > > These APIs have already considered the case in which map has
> > > variable-size value, so there will be no need to add other new APIs to
> > > support such case.
> > >
> > > LIBBPF_API int bpf_map_update_sized_elem(int fd, const void *key, size_t
> > > key_sz,
> > >                                          const void *value, size_t value_sz,
> > >                                          __u64 flags);
> > > LIBBPF_API int bpf_map_lookup_sized_elem(int fd, const void *key, size_t
> > > key_sz,
> > >                                          void *value, size_t *value_sz,
> > >                                          __u64 flags);
> > > LIBBPF_API int bpf_map_delete_sized_elem(int fd, const void *key, size_t
> > > key_sz,
> > >                                          __u64 flags);
> > > LIBBPF_API int bpf_map_get_next_sized_key(int fd,
> > >                                           const void *key, size_t key_sz,
> > >                                           void *next_key, size_t
> > > *next_key_sz);
> >
> > I don't like this approach.
> > It looks messy to me and solving one specific case where
> > key/value is a blob of bytes.
> > In other words it's taking api to pre-BTF days when everything
> > was an opaque blob.
> > I think we need a new object dynptr-like that is composable with other types.
> > So that user can say that key is
> > struct map_key {
> >    long foo;
> >    dynptr_like array;
> >    int bar;
> > };
> >
> > I'm not sure whether the existing bpf_dynptr fits exactly, but it's
> > close enough.
> > Such dynptr_like object should be able to be used as a string.
> > And map should allow two such strings:
> > struct map_key {
> >    dynptr_like file_name;
> >    dynptr_like dir;
> > };
>
> "bpf_byte_slice" or something like that? Or you want that memory to
> also not be just bytes and instead be yet another type? I.e., how far
> is this dynamic variably-sized concept will go? Just one level or
> more?

Arrays of structs is imo overkill at this point.
So just one level and just bytes.
That will be enough to significantly improve bpftrace performance.

> And when an update is done for such a key, map implementation will do
> extra memory allocations to create a copy, is that the idea?

According to Hou's plan in qp-trie it will be one long binary key.
No extra allocs.
For hash map we can do similar.
Every key can be inline marshaled key without extra indirection.
Or each dynptr can map to exactly dynptr in a key.
Then extra alloc for data will be needed.
I suspect that will be slower and more complex code.
One marshaled key with all fields feels simpler.
I believe that's what Hou is proposing.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux