Re: [PATCH bpf-next 1/2] libbpf: auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 10 Dec 2021 10:25:15 -0800

On Fri, Dec 10, 2021 at 3:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Andrii Nakryiko <andrii@xxxxxxxxxx> writes:
>
> > The need to increase RLIMIT_MEMLOCK to do anything useful with BPF is
> > one of the first extremely frustrating gotchas that all new BPF users go
> > through and in some cases have to learn it a very hard way.
> >
> > Luckily, starting with upstream Linux kernel version 5.11, BPF subsystem
> > dropped the dependency on memlock and uses memcg-based memory accounting
> > instead. Unfortunately, detecting memcg-based BPF memory accounting is
> > far from trivial (as can be evidenced by this patch), so in practice
> > most BPF applications still do unconditional RLIMIT_MEMLOCK increase.
> >
> > As we move towards libbpf 1.0, it would be good to allow users to forget
> > about RLIMIT_MEMLOCK vs memcg and let libbpf do the sensible adjustment
> > automatically. This patch paves the way forward in this matter. Libbpf
> > will do feature detection of memcg-based accounting, and if detected,
> > will do nothing. But if the kernel is too old, just like BCC, libbpf
> > will automatically increase RLIMIT_MEMLOCK on behalf of user
> > application ([0]).
> >
> > As this is technically a breaking change, during the transition period
> > applications have to opt into libbpf 1.0 mode by setting
> > LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK bit when calling
> > libbpf_set_strict_mode().
> >
> > Libbpf allows to control the exact amount of set RLIMIT_MEMLOCK limit
> > with libbpf_set_memlock_rlim_max() API. Passing 0 will make libbpf do
> > nothing with RLIMIT_MEMLOCK. libbpf_set_memlock_rlim_max() has to be
> > called before the first bpf_prog_load(), bpf_btf_load(), or
> > bpf_object__load() call, otherwise it has no effect and will return
> > -EBUSY.
> >
> >   [0] Closes: https://github.com/libbpf/libbpf/issues/369
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > ---
> >  tools/lib/bpf/bpf.c             | 122 ++++++++++++++++++++++++++++++++
> >  tools/lib/bpf/bpf.h             |   2 +
> >  tools/lib/bpf/libbpf.c          |  47 +++---------
> >  tools/lib/bpf/libbpf.map        |   1 +
> >  tools/lib/bpf/libbpf_internal.h |  39 ++++++++++
> >  tools/lib/bpf/libbpf_legacy.h   |  12 +++-
> >  6 files changed, 184 insertions(+), 39 deletions(-)
> >
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 4e7836e1a7b5..5b14111b80dd 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
> > @@ -29,6 +29,7 @@
> >  #include <errno.h>
> >  #include <linux/bpf.h>
> >  #include <limits.h>
> > +#include <sys/resource.h>
> >  #include "bpf.h"
> >  #include "libbpf.h"
> >  #include "libbpf_internal.h"
> > @@ -94,6 +95,119 @@ static inline int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int
> >       return fd;
> >  }
> >
> > +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
> > + * memcg-based memory accounting for BPF maps and progs. This was done in [0],
> > + * but it's not straightforward to detect those changes from the user-space.
> > + * So instead we'll try to detect whether [1] is in the kernel, which follows
> > + * [0] almost immediately and made it into the upstream kernel in the same
> > + * release.
> > + *
> > + * For this, we'll upload a trivial BTF into the kernel and will try to load
> > + * a trivial BPF program with attach_btf_obj_fd pointing to our BTF. If it
> > + * returns anything but -EOPNOTSUPP, we'll assume we still need
> > + * RLIMIT_MEMLOCK.
> > + *
> > + *   [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@xxxxxx/
> > + *   [1] Fixes: 8bdd8e275ede ("bpf: Return -ENOTSUPP when attaching to non-kernel BTF")
> > + */
> > +int probe_memcg_account(void)
> > +{
> > +     const size_t bpf_load_attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd);
> > +     const size_t btf_load_attr_sz = offsetofend(union bpf_attr, btf_log_level);
> > +     int prog_fd = -1, btf_fd = -1, err;
> > +     struct bpf_insn insns[1] = {}; /* instructions don't matter */
> > +     const void *btf_data;
> > +     union bpf_attr attr;
> > +     __u32 btf_data_sz;
> > +     struct btf *btf;
> > +
> > +     /* create the simplest BTF object and upload it into the kernel */
> > +     btf = btf__new_empty();
> > +     err = libbpf_get_error(btf);
> > +     if (err)
> > +             return err;
> > +     err = btf__add_int(btf, "int", 4, 0);
> > +     btf_data = btf__raw_data(btf, &btf_data_sz);
> > +     if (!btf_data) {
> > +             err = -ENOMEM;
> > +             goto cleanup;
> > +     }
> > +
> > +     /* we won't use bpf_btf_load() or bpf_prog_load() because they will
> > +      * be trying to detect this feature and will cause an infinite loop
> > +      */
> > +     memset(&attr, 0, btf_load_attr_sz);
> > +     attr.btf = ptr_to_u64(btf_data);
> > +     attr.btf_size = btf_data_sz;
> > +     btf_fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, btf_load_attr_sz);
> > +     if (btf_fd < 0) {
> > +             err = -errno;
> > +             goto cleanup;
> > +     }
> > +
> > +     /* attempt loading freplace trying to use custom BTF */
> > +     memset(&attr, 0, bpf_load_attr_sz);
> > +     attr.prog_type = BPF_PROG_TYPE_TRACING;
> > +     attr.expected_attach_type = BPF_TRACE_FENTRY;
> > +     attr.insns = ptr_to_u64(insns);
> > +     attr.insn_cnt = sizeof(insns) / sizeof(insns[0]);
> > +     attr.license = ptr_to_u64("GPL");
> > +     attr.attach_btf_obj_fd = btf_fd;
> > +
> > +     prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, bpf_load_attr_sz);
> > +     if (prog_fd >= 0)
> > +             /* bug? we shouldn't be able to successfully load program */
> > +             err = -EFAULT;
> > +     else
> > +             /* assume memcg accounting is supported if we get -EOPNOTSUPP */
> > +             err = errno == 524 /* ENOTSUPP */ ? 1 : 0;
>
> Nit: comment and code seems to be in disagreement over the name of the
> error code here.

One careful reader there :) There is also a comment about
bpf_link_create, which I removed because BPF_LINK_CREATE didn't depend
on memlock. I'll update both and will send v2, thanks.

>
> (side note, since we've ended up using ENOTSUPP everywhere in BPF, any
> reason we can't export it in a UAPI header somewhere?)

No opinion here, tbh, but certainly not part of this change.

>
> -Toke
>