Re: [PATCH bpf-next 1/2] libbpf: make RINGBUF map size adjustments more eagerly

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 15 Jul 2022 16:05:32 -0700

On Fri, Jul 15, 2022 at 3:43 PM Yonghong Song <yhs@xxxxxx> wrote:
>
>
>
> On 7/15/22 12:24 PM, Andrii Nakryiko wrote:
> > Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
> > of page_size) more eagerly: during open phase when initializing the map
> > and on explicit calls to bpf_map__set_max_entries().
> >
> > Such approach allows user to check actual size of BPF ringbuf even
> > before it's created in the kernel, but also it prevents various edge
> > case scenarios where BPF ringbuf size can get out of sync with what it
> > would be in kernel. One of them (reported in [0]) is during an attempt
> > to pin/reuse BPF ringbuf.
> >
> > Move adjust_ringbuf_sz() helper closer to its first actual use. The
> > implementation of the helper is unchanged.
> >
> >    [0] Closes: https://github.com/libbpf/libbpf/issue/530
>
> The link is not accessible. https://github.com/libbpf/libbpf/pull/530
> is accessible.
>

hm... strange, when I tried back then it was redirecting issue to
pull, but now it doesn't work. I'll update to use pull/530

> >
> > Fixes: 0087a681fa8c ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >   tools/lib/bpf/libbpf.c | 77 +++++++++++++++++++++++-------------------
> >   1 file changed, 42 insertions(+), 35 deletions(-)
> >
> > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > index 68da1aca406c..2767d1897b4f 100644
> > --- a/tools/lib/bpf/libbpf.c
> > +++ b/tools/lib/bpf/libbpf.c
> > @@ -2320,6 +2320,37 @@ int parse_btf_map_def(const char *map_name, struct btf *btf,
> >       return 0;
> >   }
> >
> > +static size_t adjust_ringbuf_sz(size_t sz)
> > +{
> > +     __u32 page_sz = sysconf(_SC_PAGE_SIZE);
> > +     __u32 mul;
> > +
> > +     /* if user forgot to set any size, make sure they see error */
> > +     if (sz == 0)
> > +             return 0;
> > +     /* Kernel expects BPF_MAP_TYPE_RINGBUF's max_entries to be
> > +      * a power-of-2 multiple of kernel's page size. If user diligently
> > +      * satisified these conditions, pass the size through.
> > +      */
> > +     if ((sz % page_sz) == 0 && is_pow_of_2(sz / page_sz))
> > +             return sz;
> > +
> > +     /* Otherwise find closest (page_sz * power_of_2) product bigger than
> > +      * user-set size to satisfy both user size request and kernel
> > +      * requirements and substitute correct max_entries for map creation.
> > +      */
> > +     for (mul = 1; mul <= UINT_MAX / page_sz; mul <<= 1) {
> > +             if (mul * page_sz > sz)
> > +                     return mul * page_sz;
> > +     }
> > +
> > +     /* if it's impossible to satisfy the conditions (i.e., user size is
> > +      * very close to UINT_MAX but is not a power-of-2 multiple of
> > +      * page_size) then just return original size and let kernel reject it
> > +      */
> > +     return sz;
> > +}
> > +
> >   static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def)
> >   {
> >       map->def.type = def->map_type;
> > @@ -2333,6 +2364,10 @@ static void fill_map_from_def(struct bpf_map *map, const struct btf_map_def *def
> >       map->btf_key_type_id = def->key_type_id;
> >       map->btf_value_type_id = def->value_type_id;
> >
> > +     /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
> > +     if (map->def.type == BPF_MAP_TYPE_RINGBUF)
> > +             map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
> > +
> >       if (def->parts & MAP_DEF_MAP_TYPE)
> >               pr_debug("map '%s': found type = %u.\n", map->name, def->map_type);
> >
> > @@ -4306,9 +4341,15 @@ struct bpf_map *bpf_map__inner_map(struct bpf_map *map)
> >
> >   int bpf_map__set_max_entries(struct bpf_map *map, __u32 max_entries)
> >   {
> > -     if (map->fd >= 0)
> > +     if (map->obj->loaded)
> >               return libbpf_err(-EBUSY);
>
> This change is not explained in the commit message. Could you explain
> why this change? In libbpf.c, there are multiple places using map->f >=
> 0, not sure whether there are any potential issues for those cases or not.

obj->loaded is more reliable way to check that bpf_object__load()
already happened. It used to be equivalent to checking that map has FD
assigned, but with bpf_map__set_autocreate(map, false) we can now have
loaded object and no map FD. So I just made the check more reliable
while modifying this API. I don't know if it's worth adding comment
here, seems pretty self-explanatory in code, but I'll add a sentenced
about this in the commit.

>
> > +
> >       map->def.max_entries = max_entries;
> > +
> > +     /* auto-adjust BPF ringbuf map max_entries to be a multiple of page size */
> > +     if (map->def.type == BPF_MAP_TYPE_RINGBUF)
> > +             map->def.max_entries = adjust_ringbuf_sz(map->def.max_entries);
> > +
> >       return 0;
> >   }
> >
> [...]