Re: [PATCH bpf-next] API function for retrieving data from percpu map

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 6 May 2022 13:55:28 -0700

On Wed, Apr 27, 2022 at 7:55 PM Grant Seltzer Richman
<grantseltzer@xxxxxxxxx> wrote:
>
> On Wed, Apr 27, 2022 at 6:16 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Wed, Apr 27, 2022 at 2:43 PM grantseltzer <grantseltzer@xxxxxxxxx> wrote:
> > >
> > > From: Grant Seltzer <grantseltzer@xxxxxxxxx>
> > >
> > > This adds a new API function used to retrieve all data from a
> > > percpu array or percpu hashmap.
> > >
> > > This is useful because the current interface for doing so
> > > requires knowledge of the layout of these BPF map internal
> > > structures.
> > >
> > > Signed-off-by: Grant Seltzer <grantseltzer@xxxxxxxxx>
> > > ---
> > >  tools/lib/bpf/libbpf.c | 28 ++++++++++++++++++++++++++++
> > >  tools/lib/bpf/libbpf.h | 25 +++++++++++++++++++++++++
> > >  2 files changed, 53 insertions(+)
> > >
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 873a29ce7781..8d72cff22688 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -36,6 +36,7 @@
> > >  #include <linux/perf_event.h>
> > >  #include <linux/ring_buffer.h>
> > >  #include <linux/version.h>
> > > +#include <linux/math.h>
> > >  #include <sys/epoll.h>
> > >  #include <sys/ioctl.h>
> > >  #include <sys/mman.h>
> > > @@ -4370,6 +4371,33 @@ int bpf_map__resize(struct bpf_map *map, __u32 max_entries)
> > >         return bpf_map__set_max_entries(map, max_entries);
> > >  }
> > >
> > > +void *bpf_map__get_percpu_value(const struct bpf_map *map, const void *key)
> >
> > I'd rather avoid API that allocates memory on behalf of user (and
> > requires user to later free it) if possible. In this case there is no
> > need for libbpf itself to allocate memory, user can allocate enough
> > memory and pass it to libbpf.
>
> I see, this makes sense. I also considered defining a macro instead,
> similar to `bpf_object__for_each_program`, except something like
> `bpf_map__for_each_cpu`.
>
> >
> > I'm actually working on few related APIs to avoid using low-level
> > bpf_map_update_elem() from user-space. I want to add
> > bpf_map__update_elem(), bpf_map__lookup_elem(),
> > bpf_map__delete_elem(). It's TBD how it will look like for per-cpu
> > maps, but I didn't plan to have a separate API for that. It would just
> > change expectations of value size to be a value_size * num_cpus.
>
> Ah ok, you actually mentioned this to me once before, about modeling
> the  API to accept key/value sizes and validate based on bpf_map's
> definition. Please let me know if I can help with this, I'd be happy
> to tackle it.

It depends on whether you are planning to work on it soon. I'd like to
get this into v0.8 in the next week or two. If you think you can
actively work on it next week and iterate if necessary quickly, then
yes, please go ahead. I have few other small things to wrap up besides
this and this will free a bit of time for me.

Basically, I was thinking to have an API like below:

int bpf_map__lookup_elem(const struct bpf_map *map, const void *key,
size_t key_sz, void *value, size_t value_sz);

and checking 1) that map is created (fd >= 0) 2) and key_sz/value_sz
match our knowledge of the map's definition, which for per-CPU maps
would mean that value_sz is actual value_size *
libbpf_num_possible_cpus().

Similarly for other very popular operations on maps: update and
delete. We can probably also add wrappers for BPF_MAP_GET_NEXT_KEY and
LOOKUP_AND_DELETE_ELEM commands. I didn't plan to add batch operations
yet, as they are much less popular, so didn't want to over do it.

>
>
> >
> > So stay tuned, hopefully very soon
> >
> > > +{
> > > +
> > > +       if (!(bpf_map__type(map) == BPF_MAP_TYPE_PERCPU_ARRAY ||
> > > +               bpf_map__type(map) == BPF_MAP_TYPE_PERCPU_HASH)) {
> > > +               return libbpf_err_ptr(-EINVAL);
> > > +       }
> > > +
> > > +       int num_cpus;
> > > +       __u32 value_size;
> > > +       num_cpus = libbpf_num_possible_cpus();
> > > +
> > > +       if (num_cpus < 0)
> > > +               return libbpf_err_ptr(-EBUSY);
> > > +
> > > +       value_size = bpf_map__value_size(map);
> > > +
> > > +       void *data = malloc(roundup(value_size, 8) * num_cpus);
> > > +       int err = bpf_map_lookup_elem(map->fd, key, data);
> > > +       if (err) {
> > > +               free(data);
> > > +               return libbpf_err_ptr(err);
> > > +       }
> > > +
> > > +       return data;
> > > +}
> > > +
> > >  static int
> > >  bpf_object__probe_loading(struct bpf_object *obj)
> > >  {
> > > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> > > index cdbfee60ea3e..99b218702dfb 100644
> > > --- a/tools/lib/bpf/libbpf.h
> > > +++ b/tools/lib/bpf/libbpf.h
> > > @@ -921,6 +921,31 @@ LIBBPF_API const void *bpf_map__initial_value(struct bpf_map *map, size_t *psize
> > >  LIBBPF_DEPRECATED_SINCE(0, 8, "use bpf_map__type() instead")
> > >  LIBBPF_API bool bpf_map__is_offload_neutral(const struct bpf_map *map);
> > >
> > > +/**
> > > + * @brief **bpf_map__get_percpu_value()** returns a pointer to an array
> > > + * of data stored in a per-cpu array or per-cpu hashmap at a specified
> > > + * key. Each element is padded to 8 bytes regardless of the value data
> > > + * type stored in the per-cpu map. The index of each element in the array
> > > + * corresponds with the cpu that the data was set from.
> > > + * @param map per-cpu array or per-cpu hashmap
> > > + * @param key the key or index in the map
> > > + * @return pointer to the array of data
> > > + *
> > > + * example usage:
> > > + *
> > > + *  values = bpf_map__get_percpu_value(bpfmap, (void*)&zero);
> > > + *  if (values == NULL) {
> > > + *     // error handling
> > > + *  }
> > > + *
> > > + *     void* ptr = values;
> > > + *  for (int i = 0; i < num_cpus; i++) {
> > > + *    printf("CPU %d: %ld\n", i, *(ulong*)ptr);
> > > + *    ptr += 8;
> > > + *  }
> > > + */
> > > +LIBBPF_API void *bpf_map__get_percpu_value(const struct bpf_map *map, const void *key);
> > > +
> > >  /**
> > >   * @brief **bpf_map__is_internal()** tells the caller whether or not the
> > >   * passed map is a special map created by libbpf automatically for things like
> > > --
> > > 2.34.1
> > >