Re: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map

Yafang Shao <laoar.shao@xxxxxxxxx> · Tue, 2 Aug 2022 21:47:02 +0800

On Tue, Aug 2, 2022 at 12:55 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Fri, Jul 29, 2022 at 03:23:16PM +0000, Yafang Shao wrote:
> > A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE
> > command, which is the fd of an opened cgroup directory. In this cgroup,
> > the memory subsystem must be enabled. This value is valid only when
> > BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the
> > memory cgroup from this fd, it will set this memcg into bpf map, then
> > all the subsequent memory allocation of this map will be charge to the
> > memcg.
> >
> > The map creation paths in libbpf are also changed consequently.
> >
> > Currently it is only supported for cgroup2 directory.
> >
> > The usage of this new member as follows,
> >       struct bpf_map_create_opts map_opts = {
> >               .sz = sizeof(map_opts),
> >               .map_flags = BPF_F_SELECTABLE_MEMCG,
> >       };
> >       int memcg_fd, int map_fd;
> >       int key, value;
> >
> >       memcg_fd = open("/cgroup2", O_DIRECTORY);
> >       if (memcg_fd < 0) {
> >               perror("memcg dir open");
> >               return -1;
> >       }
> >
> >       map_opts.memcg_fd = memcg_fd;
> >       map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg",
> >                               sizeof(key), sizeof(value),
> >                               1024, &map_opts);
> >       if (map_fd <= 0) {
> >               perror("map create");
> >               return -1;
> >       }
>
> Overall the api extension makes sense.
> The flexibility of selecting memcg is useful.
>

Thanks!

> > Signed-off-by: Yafang Shao <laoar.shao@xxxxxxxxx>
> > ---
> >  include/uapi/linux/bpf.h       |  2 ++
> >  kernel/bpf/syscall.c           | 47 ++++++++++++++++++++++++++--------
> >  tools/include/uapi/linux/bpf.h |  2 ++
> >  tools/lib/bpf/bpf.c            |  1 +
> >  tools/lib/bpf/bpf.h            |  3 ++-
> >  tools/lib/bpf/libbpf.c         |  2 ++
> >  6 files changed, 46 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index d5fc1ea70b59..a6e02c8be924 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -1296,6 +1296,8 @@ union bpf_attr {
> >                                                  * struct stored as the
> >                                                  * map value
> >                                                  */
> > +             __s32   memcg_fd;       /* selectable memcg */
> > +             __s32   :32;            /* hole */
>
> new fields cannot be inserted in the middle of uapi struct.
>

There's a "#define BPF_MAP_CREATE_LAST_FIELD map_extra" in
kernel/bpf/syscall.c, and thus I thought it may have some special
meaning, so I put the new field above it.
Now that it doesn't have any special meaning, I will change it as you suggested.

> >               /* Any per-map-type extra fields
> >                *
> >                * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 6401cc417fa9..9900e2b87315 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
> >  }
> >
> >  #ifdef CONFIG_MEMCG_KMEM
> > -static void bpf_map_save_memcg(struct bpf_map *map)
> > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
> >  {
> > -     /* Currently if a map is created by a process belonging to the root
> > -      * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > -      * So we have to check map->objcg for being NULL each time it's
> > -      * being used.
> > -      */
> > -     map->objcg = get_obj_cgroup_from_current();
> > +     struct obj_cgroup *objcg;
> > +     struct cgroup *cgrp;
> > +
> > +     if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) {
>
> The flag is unnecessary. Just add memcg_fd to the end of attr and use != 0
> as a condition that it should be used instead of get_obj_cgroup_from_current().
> There are other parts of bpf uapi that have similar fd handling logic.
>

Right. There's a ensure_good_fd() to make the fd a positive number.
I will change it.

> > +             cgrp = cgroup_get_from_fd(attr->memcg_fd);
> > +             if (IS_ERR(cgrp))
> > +                     return -EINVAL;
> > +
> > +             objcg = get_obj_cgroup_from_cgroup(cgrp);
> > +             if (IS_ERR(objcg))
> > +                     return PTR_ERR(objcg);
> > +     } else {
> > +             /* Currently if a map is created by a process belonging to the root
> > +              * memory cgroup, get_obj_cgroup_from_current() will return NULL.
> > +              * So we have to check map->objcg for being NULL each time it's
> > +              * being used.
> > +              */
> > +             objcg = get_obj_cgroup_from_current();
> > +     }
> > +
> > +     map->objcg = objcg;
> > +     return 0;
> >  }
> >
> >  static void bpf_map_release_memcg(struct bpf_map *map)
> > @@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size,
> >  }
> >
> >  #else
> > -static void bpf_map_save_memcg(struct bpf_map *map)
> > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr)
> >  {
> > +     return 0;
> >  }
> >
> >  static void bpf_map_release_memcg(struct bpf_map *map)
> > @@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node)
>
> High level uapi struct should not be passed into low level helper like this.
> Pls pass memcg_fd instead.
>

Sure, I will do it.

-- 
Regards
Yafang