Re: APIs for qp-trie //Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 26, 2024 at 8:41 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Jun 26, 2024 at 4:59 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 25, 2024 at 7:06 PM Alexei Starovoitov
> > <alexei.starovoitov@xxxxxxxxx> wrote:
> > >
> > > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Sorry to resurrect the old thread to continue the discussion of APIs for
> > > > qp-trie.
> > > >
> > > > On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
> > > > > On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> > > > >> Hi,
> > > > >>
> > > >
> > > > SNIP
> > > >
> > > > >> updated to allow using dynptr as map key for qp-trie.
> > > > >>> And that's the problem I just mentioned.
> > > > >>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
> > > > >>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
> > > > >> Sorry for misunderstanding your reply. But before switch to the kfuncl
> > > > >> way, could you please point me to some code or function which shows the
> > > > >> specialty of PTR_MAP_KEY ?
> > > > >>
> > > > >>
> > > > > Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
> > > > > logic assumes that there is associated struct bpf_map * pointer from
> > > > > which we know fixed-sized key length.
> > > > >
> > > > > But getting back to the topic at hand. I vaguely remember discussion
> > > > > we had, but it would be good if you could summarize it again here to
> > > > > avoid talking past each other. What is the bpf_map_ops changes you
> > > > > were thinking to do? How bpf_attr will look like? How BPF-side API for
> > > > > lookup/delete/update will look like? And then let's go from there?
> > > > > Thanks!
> > > > >
> > > > > .
> > > >
> > > > The APIs for qp-trie are composed of the followings 5 parts:
> > > >
> > > > (1) map definition for qp-trie
> > > >
> > > > The key is bpf_dynptr and map_extra specifies the max length of key.
> > > >
> > > > struct {
> > > >     __uint(type, BPF_MAP_TYPE_QP_TRIE);
> > > >     __type(key, struct bpf_dynptr);
> > > >     __type(value, unsigned int);
> > > >     __uint(map_flags, BPF_F_NO_PREALLOC);
> > > >     __uint(map_extra, 1024);
> > > > } qp_trie SEC(".maps");
> > > >
> > > > (2) bpf_attr
> > > >
> > > > Add key_sz & next_key_sz into anonymous struct to support map with
> > > > variable-size key. We could add value_sz if the map with variable-size
> > > > value is supported in the future.
> > > >
> > > >         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> > > >                 __u32           map_fd;
> > > >                 __aligned_u64   key;
> > > >                 union {
> > > >                         __aligned_u64 value;
> > > >                         __aligned_u64 next_key;
> > > >                 };
> > > >                 __u64           flags;
> > > >                 __u32           key_sz;
> > > >                 __u32           next_key_sz;
> > > >         };
> > > >
> > > > (3) libbpf API
> > > >
> > > > Add bpf_map__get_next_sized_key() to high level APIs.
> > > >
> > > > LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map,
> > > >                                            const void *cur_key,
> > > >                                            size_t cur_key_sz,
> > > >                                            void *next_key, size_t
> > > > *next_key_sz);
> > > >
> > > > Add
> > > > bpf_map_update_sized_elem()/bpf_map_lookup_sized_elem()/bpf_map_delete_sized_elem()/bpf_map_get_next_sized_key()
> > > > to low level APIs.
> > > > These APIs have already considered the case in which map has
> > > > variable-size value, so there will be no need to add other new APIs to
> > > > support such case.
> > > >
> > > > LIBBPF_API int bpf_map_update_sized_elem(int fd, const void *key, size_t
> > > > key_sz,
> > > >                                          const void *value, size_t value_sz,
> > > >                                          __u64 flags);
> > > > LIBBPF_API int bpf_map_lookup_sized_elem(int fd, const void *key, size_t
> > > > key_sz,
> > > >                                          void *value, size_t *value_sz,
> > > >                                          __u64 flags);
> > > > LIBBPF_API int bpf_map_delete_sized_elem(int fd, const void *key, size_t
> > > > key_sz,
> > > >                                          __u64 flags);
> > > > LIBBPF_API int bpf_map_get_next_sized_key(int fd,
> > > >                                           const void *key, size_t key_sz,
> > > >                                           void *next_key, size_t
> > > > *next_key_sz);
> > >
> > > I don't like this approach.
> > > It looks messy to me and solving one specific case where
> > > key/value is a blob of bytes.
> > > In other words it's taking api to pre-BTF days when everything
> > > was an opaque blob.
> > > I think we need a new object dynptr-like that is composable with other types.
> > > So that user can say that key is
> > > struct map_key {
> > >    long foo;
> > >    dynptr_like array;
> > >    int bar;
> > > };
> > >
> > > I'm not sure whether the existing bpf_dynptr fits exactly, but it's
> > > close enough.
> > > Such dynptr_like object should be able to be used as a string.
> > > And map should allow two such strings:
> > > struct map_key {
> > >    dynptr_like file_name;
> > >    dynptr_like dir;
> > > };
> >
> > "bpf_byte_slice" or something like that? Or you want that memory to
> > also not be just bytes and instead be yet another type? I.e., how far
> > is this dynamic variably-sized concept will go? Just one level or
> > more?
>
> Arrays of structs is imo overkill at this point.
> So just one level and just bytes.

yep, agreed

> That will be enough to significantly improve bpftrace performance.
>
> > And when an update is done for such a key, map implementation will do
> > extra memory allocations to create a copy, is that the idea?
>
> According to Hou's plan in qp-trie it will be one long binary key.
> No extra allocs.

I see, I missed that part.

One part which made me say that this approach isn't compatible with
qp-trie is that trie generally supports "find lexicographically next
key" operation, but here it just won't work. But thinking about
qp-trie some more, I'm not sure it actually supports lexicographically
next key, so it's a moot point.

> For hash map we can do similar.
> Every key can be inline marshaled key without extra indirection.
> Or each dynptr can map to exactly dynptr in a key.
> Then extra alloc for data will be needed.
> I suspect that will be slower and more complex code.
> One marshaled key with all fields feels simpler.
> I believe that's what Hou is proposing.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux