Re: [PATCH bpf-next v4 2/5] libbpf: Add "map_extra" as a per-map-type extra flag

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 21 Oct 2021 14:41:19 -0700

On Thu, Oct 21, 2021 at 1:14 PM Joanne Koong <joannekoong@xxxxxx> wrote:
>
> On 10/20/21 2:21 PM, Andrii Nakryiko wrote:
>
> > On Wed, Oct 20, 2021 at 2:09 PM Joanne Koong <joannekoong@xxxxxx> wrote:
> >> On 10/8/21 4:19 PM, Andrii Nakryiko wrote:
> >>
> >>> On Wed, Oct 6, 2021 at 3:27 PM Joanne Koong <joannekoong@xxxxxx> wrote:
> >>>> This patch adds the libbpf infrastructure for supporting a
> >>>> per-map-type "map_extra" field, whose definition will be
> >>>> idiosyncratic depending on map type.
> >>>>
> >>>> For example, for the bitset map, the lower 4 bits of map_extra
> >>>> is used to denote the number of hash functions.
> >>>>
> >>>> Signed-off-by: Joanne Koong <joannekoong@xxxxxx>
> >>>> ---
> >>>>    include/uapi/linux/bpf.h        |  1 +
> >>>>    tools/include/uapi/linux/bpf.h  |  1 +
> >>>>    tools/lib/bpf/bpf.c             |  1 +
> >>>>    tools/lib/bpf/bpf.h             |  1 +
> >>>>    tools/lib/bpf/bpf_helpers.h     |  1 +
> >>>>    tools/lib/bpf/libbpf.c          | 25 ++++++++++++++++++++++++-
> >>>>    tools/lib/bpf/libbpf.h          |  4 ++++
> >>>>    tools/lib/bpf/libbpf.map        |  2 ++
> >>>>    tools/lib/bpf/libbpf_internal.h |  4 +++-
> >>>>    9 files changed, 38 insertions(+), 2 deletions(-)
> >>>>
> >>>> [...]
> >>>>
> >>>> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> >>>> index 7d1741ceaa32..41e3e85e7789 100644
> >>>> --- a/tools/lib/bpf/bpf.c
> >>>> +++ b/tools/lib/bpf/bpf.c
> >>>> @@ -97,6 +97,7 @@ int bpf_create_map_xattr(const struct bpf_create_map_attr *create_attr)
> >>>>           attr.btf_key_type_id = create_attr->btf_key_type_id;
> >>>>           attr.btf_value_type_id = create_attr->btf_value_type_id;
> >>>>           attr.map_ifindex = create_attr->map_ifindex;
> >>>> +       attr.map_extra = create_attr->map_extra;
> >>>>           if (attr.map_type == BPF_MAP_TYPE_STRUCT_OPS)
> >>>>                   attr.btf_vmlinux_value_type_id =
> >>>>                           create_attr->btf_vmlinux_value_type_id;
> >>>> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> >>>> index 6fffb3cdf39b..c4049f2d63cc 100644
> >>>> --- a/tools/lib/bpf/bpf.h
> >>>> +++ b/tools/lib/bpf/bpf.h
> >>>> @@ -50,6 +50,7 @@ struct bpf_create_map_attr {
> >>>>                   __u32 inner_map_fd;
> >>>>                   __u32 btf_vmlinux_value_type_id;
> >>>>           };
> >>>> +       __u32 map_extra;
> >>> this struct is frozen, we can't change it. It's fine to not allow
> >>> passing map_extra in libbpf APIs. We have libbpf 1.0 task to revamp
> >>> low-level APIs like map creation in a way that will allow good
> >>> extensibility. You don't have to worry about that in this patch set.
> >> I see! From my understanding, without "map_extra" added to the
> >> bpf_create_map_attr struct, it's not possible in the subsequent
> >> bloom filter benchmark tests to set the map_extra flag, which
> > Didn't you add bpf_map__set_map_extra() setter for that? Also one can
> > always do direct bpf syscall (see sys_bpf in tools/lib/bpf/bpf.c), if
> > absolutely necessary. But set_map_extra() setter is the way to go for
> > benchmark, I think.
> bpf_map__set_map_extra() sets the map_extra field for the bpf_map
> struct, but that field can't get propagated through to the kernel
> when the BPF_MAP_CREATE syscall is called in bpf_map_create_xattr.
> This is because bpf_map_create_xattr takes in a "bpf_create_map_attr"
> struct to instantiate the "bpf_attr" struct it passes to the kernel, but
> map_extra is not part of "bpf_create_map_attr" struct and can't be
> added since the struct is frozen.

Oh, that's where the problem is. Libbpf internally doesn't have to use
bpf_create_map_xattr(). We are going to revamp all these low-level
interfaces to be extensible, but until then, I think it will be fine
to just create an internal helper that would allow us to create maps
without restrictions of maintaining API compatibility. See what we did
with libbpf__bpf_prog_load().

>
> I don't think doing a direct bpf syscall in the userspace program,
> and then passing the "int bloom_map_fd" to the bpf program
> through the skeleton works either. This is because in the bpf program,
> we can't call bpf_map_peek/push since these only take in a
> "struct bpf_map *", and not an fd. We can't go from fd -> struct bpf_map *
> either with something like
>
> struct fd f = fdget(bloom_map_fd);
> struct bpf_map *map = __bpf_map_get(f);
>
> since both "__bpf_map_get" and "fdget" are not functions bpf programs
> can call.

On BPF side there is no "struct bpf_map", actually. bpf_map_peek()
takes just "void *" which will be just a reference to the variable
that represents the map (and BPF verifier actually does the right
thing during program load, passing correct kernel address of the map).
On user-space side, though, user can use bpf_map__reuse_fd() to set
everything up, if they create map with their own custom logic.

But we are getting too much into the weeds. Let's just copy/paste
bpf_map_create_xattr() for now and add map_extra support there. And
pretty soon we'll have a nicer set of low-level APIs, at which point
we'll switch to using them internally as well.

>
>
> >> means we can't set the number of hash functions. (The entrypoint
> >> for propagating the flags to the kernel at map creation time is
> >> in the function "bpf_create_map_xattr", which takes in a
> >> struct bpf_create_map_attr).
> >>
> >> 1) To get the benchmark numbers for different # of hash functions, I'll
> >> test using a modified version of the code where the map_extra flags
> >> gets propagated to the kernel. I'll add a TODO to the benchmarks
> >> saying that the specified # of hash functions will get propagated for real
> >> once libbpf's map creation supports map_extra.
> >>
> >>
> >> 2) Should I  drop this libbpf patch altogether from this patchset, and add
> >> it when we do the libbpf 1.0 task to revamp the map creation APIs? Since
> >> without extending map creation to include the map_extra, these map_extra
> >> libbpf changes don't have much effect right now
> > No, getter/setter API is good to have, please keep them.
> >
> >>> [...]
> >>>> --
> >>>> 2.30.2
> >>>>