Re: APIs for qp-trie //Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 25, 2024 at 7:41 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi Andrii,
>
> On 6/25/2024 11:55 AM, Andrii Nakryiko wrote:
> > On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> Hi,
> >>
> >> Sorry to resurrect the old thread to continue the discussion of APIs for
> >> qp-trie.
> >>
> >> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
> >>> On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >>>> Hi,
> >>>>
> >> SNIP
> >>
> >>>> updated to allow using dynptr as map key for qp-trie.
> >>>>> And that's the problem I just mentioned.
> >>>>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
> >>>>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
> >>>> Sorry for misunderstanding your reply. But before switch to the kfuncl
> >>>> way, could you please point me to some code or function which shows the
> >>>> specialty of PTR_MAP_KEY ?
> >>>>
> >>>>
> >>> Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
> >>> logic assumes that there is associated struct bpf_map * pointer from
> >>> which we know fixed-sized key length.
> >>>
> >>> But getting back to the topic at hand. I vaguely remember discussion
> >>> we had, but it would be good if you could summarize it again here to
> >>> avoid talking past each other. What is the bpf_map_ops changes you
> >>> were thinking to do? How bpf_attr will look like? How BPF-side API for
> >>> lookup/delete/update will look like? And then let's go from there?
> >>> Thanks!
> >>>
> >>> .
> >> The APIs for qp-trie are composed of the followings 5 parts:
> >>
> >> (1) map definition for qp-trie
> >>
> >> The key is bpf_dynptr and map_extra specifies the max length of key.
> >>
> >> struct {
> >>     __uint(type, BPF_MAP_TYPE_QP_TRIE);
> >>     __type(key, struct bpf_dynptr);
> > I'm not sure we need `struct bpf_dynptr` as the key type. We can just
> > say that key_size has to be zero, and actual keys are variable-sized.
> >
> > Alternatively, we can treat key_size as "maximum key size", any
> > attempt to use longer keys will be rejected.
> >
> > But in either case "struct bpf_dynptr" as key type seems wrong to me.
>
> The use of bpf_dynptr services two purposes:
> (1) tell bpf subsystem that qp-trie is a map with variable-size key.
> If don't use bpf_dynptr, the purpose can be accomplished by checking the
> map_type.
>
> (2) facilitate the dump of key in bpftool when btf is available
> when dump the key & value tuple through btf dump, a btf_type is needed
> for the key. Because the key is variable-size, so neither using void
> type (key_size =0 case)  nor using the type with the maximal key size
> are appropriate. But the use of bpf_dynptr can also be avoided, if we
> add a special case for qp-trie when dumping its key.
>

both of those purposes are served just as well with something like
map_flags = BPF_F_VARLEN_KEY (naming made up for illustration purposes
only)

> Setting key_size as zero seems weird, because qp-trie has key. I prefer
> to set key_size as the maximal key size and set the btf_key_id as 0.

ok, I don't think either way is better/worse than the other

> >>     __type(value, unsigned int);
> >>     __uint(map_flags, BPF_F_NO_PREALLOC);
> >>     __uint(map_extra, 1024);
> >> } qp_trie SEC(".maps");
> >>
> >> (2) bpf_attr
> >>
> >> Add key_sz & next_key_sz into anonymous struct to support map with
> >> variable-size key. We could add value_sz if the map with variable-size
> >> value is supported in the future.
> >>
> >>         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
> >>                 __u32           map_fd;
> >>                 __aligned_u64   key;
> >>                 union {
> >>                         __aligned_u64 value;
> >>                         __aligned_u64 next_key;
> >>                 };
> >>                 __u64           flags;
> >>                 __u32           key_sz;
> >>                 __u32           next_key_sz;
> >>         };
> >>
> > Yep, this seems inevitable. And yes, value_sz seems like a reasonable
> > thing to have. It might be an option/flag whether QP-trie has
> > fixed-sized or variable-sized value, I guess. But we can get there
> > after all the other things are figured out.
>
> Do we need to add value_sz with the qp-trie patch-set or later ? I
> prefer to leave it as the future work.
> >

future work is probably fine, but we need to design with this
possibility in mind, that's all

> >> (3) libbpf API
> >>
> >> Add bpf_map__get_next_sized_key() to high level APIs.
> > All the *_sized_* names are... unfortunate, tbh. I'm not sure what's
> > the right naming, but "sized" in the middle doesn't seem that. I think
> > it should be a uniform suffix. Maybe something like "_varsz",
> > "_varlen", or at least "_sized" (but as a suffix)?...
>
> I see. I have considered bpf_map_update_vs_key_elem() or similar, but I
> changed it to _sized later. Because bpf_map_update_sized_elem() not only
> supports variable-sized key, but also supports fixed-size key,  so I
> think it is a bit weird that the high level API bpf_map__update_elem()
> invokes bpf_map__update_elem_varlen() in turn to update element for map
> with fixed-size key. I will try to add _sized as a suffix in the formal
> patch set.
> >
> >> LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map,
> >>                                            const void *cur_key,
> >>                                            size_t cur_key_sz,
> >>                                            void *next_key, size_t
> >> *next_key_sz);
>
> SNIP
> >>
> >>
> >>
> >> (4) bpf_map_ops
> >>
> >> Update the arguments for map_get_next_key()/map_lookup_elem_sys_only().
> >> Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops.
> >>
> >> Updating map_update_elem()/map_delete_elem() is also fine, but it may
> >> introduce too much churn and need to pass map->key_size to these APIs
> >> for existing callers.
> >>
> > We can have a protocol that key_size and value_size might be zero for
> > fixed-sized maps, in which case key/value size is not
> > checked/enforced, right?
>
> Yes. We could pass 0 as key_size for the existing callers of
> ->map_update_elem()/->map_delete_elem()
> >
> > I think it's much better to keep one universal interface that works
> > for both fixed- and variable-sized map (especially that we can
> > technically have maps where fixed-sized or variable-sized is a matter
> > of choice and some map_flag value).
>
> I see your point. Will update
> map_update_elem()/map_delete_elem()/map_lookup_elem() instead.
> >
> >> struct bpf_map_ops {
> >>         int (*map_get_next_key)(struct bpf_map *map, void *key, u32
> >> key_size, void *next_key, u32 *next_key_size);
> >>         void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void
> >> *key, u32 key_size);
> >>
> >>         int (*map_update_elem_sys_only)(struct bpf_map *map, void *key,
> >> u32 key_size, void *value, u64 flags);
> >>         int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key,
> >> u32 key_size);
> >> };
> >>
> >> (5) API for bpf program
> >>
> >> Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three
> >> new kfuncs to support lookup/update/deletion operation on qp-trie.
> >>
> > hopefully those won't be qp-trie specific? Also, are you planning to
> > have only key variable-sized or value as well?
>
> Will make these kfuncs be available for all maps with variable-size key
> support. I think it is better to have both variable-sized key and value
> in these kfuncs. WDYT ?

yep

> >
>
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux