Re: APIs for qp-trie //Re: Question: Is it OK to assume the address of bpf_dynptr_kern will be 8-bytes aligned and reuse the lowest bits to save extra info ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andrii,

On 6/25/2024 11:55 AM, Andrii Nakryiko wrote:
> On Mon, Jun 24, 2024 at 7:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> Sorry to resurrect the old thread to continue the discussion of APIs for
>> qp-trie.
>>
>> On 8/26/2023 2:33 AM, Andrii Nakryiko wrote:
>>> On Tue, Aug 22, 2023 at 6:12 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>> Hi,
>>>>
>> SNIP
>>
>>>> updated to allow using dynptr as map key for qp-trie.
>>>>> And that's the problem I just mentioned.
>>>>> PTR_TO_MAP_KEY is special. I don't think we should hack it to also
>>>>> mean ARG_PTR_TO_DYNPTR depending on the first argument (map type).
>>>> Sorry for misunderstanding your reply. But before switch to the kfuncl
>>>> way, could you please point me to some code or function which shows the
>>>> specialty of PTR_MAP_KEY ?
>>>>
>>>>
>>> Search in kernel/bpf/verifier.c how PTR_TO_MAP_KEY is handled. The
>>> logic assumes that there is associated struct bpf_map * pointer from
>>> which we know fixed-sized key length.
>>>
>>> But getting back to the topic at hand. I vaguely remember discussion
>>> we had, but it would be good if you could summarize it again here to
>>> avoid talking past each other. What is the bpf_map_ops changes you
>>> were thinking to do? How bpf_attr will look like? How BPF-side API for
>>> lookup/delete/update will look like? And then let's go from there?
>>> Thanks!
>>>
>>> .
>> The APIs for qp-trie are composed of the followings 5 parts:
>>
>> (1) map definition for qp-trie
>>
>> The key is bpf_dynptr and map_extra specifies the max length of key.
>>
>> struct {
>>     __uint(type, BPF_MAP_TYPE_QP_TRIE);
>>     __type(key, struct bpf_dynptr);
> I'm not sure we need `struct bpf_dynptr` as the key type. We can just
> say that key_size has to be zero, and actual keys are variable-sized.
>
> Alternatively, we can treat key_size as "maximum key size", any
> attempt to use longer keys will be rejected.
>
> But in either case "struct bpf_dynptr" as key type seems wrong to me.

The use of bpf_dynptr services two purposes:
(1) tell bpf subsystem that qp-trie is a map with variable-size key.
If don't use bpf_dynptr, the purpose can be accomplished by checking the
map_type.

(2) facilitate the dump of key in bpftool when btf is available
when dump the key & value tuple through btf dump, a btf_type is needed
for the key. Because the key is variable-size, so neither using void
type (key_size =0 case)  nor using the type with the maximal key size
are appropriate. But the use of bpf_dynptr can also be avoided, if we
add a special case for qp-trie when dumping its key.

Setting key_size as zero seems weird, because qp-trie has key. I prefer
to set key_size as the maximal key size and set the btf_key_id as 0.
>>     __type(value, unsigned int);
>>     __uint(map_flags, BPF_F_NO_PREALLOC);
>>     __uint(map_extra, 1024);
>> } qp_trie SEC(".maps");
>>
>> (2) bpf_attr
>>
>> Add key_sz & next_key_sz into anonymous struct to support map with
>> variable-size key. We could add value_sz if the map with variable-size
>> value is supported in the future.
>>
>>         struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
>>                 __u32           map_fd;
>>                 __aligned_u64   key;
>>                 union {
>>                         __aligned_u64 value;
>>                         __aligned_u64 next_key;
>>                 };
>>                 __u64           flags;
>>                 __u32           key_sz;
>>                 __u32           next_key_sz;
>>         };
>>
> Yep, this seems inevitable. And yes, value_sz seems like a reasonable
> thing to have. It might be an option/flag whether QP-trie has
> fixed-sized or variable-sized value, I guess. But we can get there
> after all the other things are figured out.

Do we need to add value_sz with the qp-trie patch-set or later ? I
prefer to leave it as the future work.
>
>> (3) libbpf API
>>
>> Add bpf_map__get_next_sized_key() to high level APIs.
> All the *_sized_* names are... unfortunate, tbh. I'm not sure what's
> the right naming, but "sized" in the middle doesn't seem that. I think
> it should be a uniform suffix. Maybe something like "_varsz",
> "_varlen", or at least "_sized" (but as a suffix)?...

I see. I have considered bpf_map_update_vs_key_elem() or similar, but I
changed it to _sized later. Because bpf_map_update_sized_elem() not only
supports variable-sized key, but also supports fixed-size key,  so I
think it is a bit weird that the high level API bpf_map__update_elem()
invokes bpf_map__update_elem_varlen() in turn to update element for map
with fixed-size key. I will try to add _sized as a suffix in the formal
patch set.
>
>> LIBBPF_API int bpf_map__get_next_sized_key(const struct bpf_map *map,
>>                                            const void *cur_key,
>>                                            size_t cur_key_sz,
>>                                            void *next_key, size_t
>> *next_key_sz);

SNIP
>>
>>
>>
>> (4) bpf_map_ops
>>
>> Update the arguments for map_get_next_key()/map_lookup_elem_sys_only().
>> Add map_update_elem_sys_only()/map_delete_elem_sys_only() into bpf_map_ops.
>>
>> Updating map_update_elem()/map_delete_elem() is also fine, but it may
>> introduce too much churn and need to pass map->key_size to these APIs
>> for existing callers.
>>
> We can have a protocol that key_size and value_size might be zero for
> fixed-sized maps, in which case key/value size is not
> checked/enforced, right?

Yes. We could pass 0 as key_size for the existing callers of
->map_update_elem()/->map_delete_elem()
>
> I think it's much better to keep one universal interface that works
> for both fixed- and variable-sized map (especially that we can
> technically have maps where fixed-sized or variable-sized is a matter
> of choice and some map_flag value).

I see your point. Will update
map_update_elem()/map_delete_elem()/map_lookup_elem() instead.
>
>> struct bpf_map_ops {
>>         int (*map_get_next_key)(struct bpf_map *map, void *key, u32
>> key_size, void *next_key, u32 *next_key_size);
>>         void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void
>> *key, u32 key_size);
>>
>>         int (*map_update_elem_sys_only)(struct bpf_map *map, void *key,
>> u32 key_size, void *value, u64 flags);
>>         int (*map_delete_elem_sys_only)(struct bpf_map *map, void *key,
>> u32 key_size);
>> };
>>
>> (5) API for bpf program
>>
>> Instead of supporting bpf_dynptr as ARG_PTR_TO_MAP_KEY, will add three
>> new kfuncs to support lookup/update/deletion operation on qp-trie.
>>
> hopefully those won't be qp-trie specific? Also, are you planning to
> have only key variable-sized or value as well?

Will make these kfuncs be available for all maps with variable-size key
support. I think it is better to have both variable-sized key and value
in these kfuncs. WDYT ?
>





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux