Re: [PATCH v2 bpf-next 1/2] libbpf: auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sat, 11 Dec 2021 18:22:46 -0800

On Sat, Dec 11, 2021 at 11:36 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Andrii Nakryiko <andrii@xxxxxxxxxx> writes:
>
> > The need to increase RLIMIT_MEMLOCK to do anything useful with BPF is
> > one of the first extremely frustrating gotchas that all new BPF users go
> > through and in some cases have to learn it a very hard way.
> >
> > Luckily, starting with upstream Linux kernel version 5.11, BPF subsystem
> > dropped the dependency on memlock and uses memcg-based memory accounting
> > instead. Unfortunately, detecting memcg-based BPF memory accounting is
> > far from trivial (as can be evidenced by this patch), so in practice
> > most BPF applications still do unconditional RLIMIT_MEMLOCK increase.
> >
> > As we move towards libbpf 1.0, it would be good to allow users to forget
> > about RLIMIT_MEMLOCK vs memcg and let libbpf do the sensible adjustment
> > automatically. This patch paves the way forward in this matter. Libbpf
> > will do feature detection of memcg-based accounting, and if detected,
> > will do nothing. But if the kernel is too old, just like BCC, libbpf
> > will automatically increase RLIMIT_MEMLOCK on behalf of user
> > application ([0]).
> >
> > As this is technically a breaking change, during the transition period
> > applications have to opt into libbpf 1.0 mode by setting
> > LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK bit when calling
> > libbpf_set_strict_mode().
> >
> > Libbpf allows to control the exact amount of set RLIMIT_MEMLOCK limit
> > with libbpf_set_memlock_rlim_max() API. Passing 0 will make libbpf do
> > nothing with RLIMIT_MEMLOCK. libbpf_set_memlock_rlim_max() has to be
> > called before the first bpf_prog_load(), bpf_btf_load(), or
> > bpf_object__load() call, otherwise it has no effect and will return
> > -EBUSY.
> >
> >   [0] Closes: https://github.com/libbpf/libbpf/issues/369
> >
> > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx>
>
> The probing approach breaks with out-of-order backports, I suppose.
> Hopefully no one will do those for that particular patch, though (it's
> not really a bugfix), and at least for RHEL we did backport them
> together.
>
> Can't think of any better ways of doing the detection either, but maybe
> something to be aware of in the future (i.e., "don't change things in a
> way that can't be detected from userspace")?
>
> Anyway, with the nits below:
>
> Acked-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
>
> > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > index 6b2407e12060..7c82136979bf 100644
> > --- a/tools/lib/bpf/bpf.c
> > +++ b/tools/lib/bpf/bpf.c
>
> [...]
>
> > +     /* attempt loading freplace trying to use custom BTF */
> > +     memset(&attr, 0, bpf_load_attr_sz);
> > +     attr.prog_type = BPF_PROG_TYPE_TRACING;
> > +     attr.expected_attach_type = BPF_TRACE_FENTRY;
>
> This comment also seems to be disagreeing with the code it's commenting
> on?
>
> [...]
>
> > +static bool memlock_bumped;
> > +static rlim_t memlock_rlim_max = RLIM_INFINITY;
> > +
> > +int libbpf_set_memlock_rlim_max(size_t memlock_max)
> > +{
> > +     if (memlock_bumped)
> > +             return libbpf_err(-EBUSY);
> > +
> > +     memlock_rlim_max = memlock_max;
> > +     return 0;
> > +}
> > +
> > +int bump_rlimit_memlock(void)
> > +{
> > +     struct rlimit rlim;
> > +
> > +     /* this the default in libbpf 1.0, but for now user has to opt-in explicitly */
> > +     if (!(libbpf_mode & LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK))
> > +             return 0;
> > +
> > +     /* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */
> > +     if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT))
> > +             return 0;
> > +
> > +     memlock_bumped = true;
> > +
> > +     /* zero memlock_rlim_max disables auto-bumping RLIMIT_MEMLOCK */
> > +     if (memlock_rlim_max == 0)
> > +             return 0;
> > +
> > +     rlim.rlim_cur = rlim.rlim_max = memlock_rlim_max;
> > +     if (setrlimit(RLIMIT_MEMLOCK, &rlim))
> > +             return -errno;
> > +
> > +     return 0;
> > +}
> > +
>
> "rlim_max" seems to imply this will only ever increase the limit, but if
> I'm reading the code correctly it could actually end up lowering the
> effective limit?

_max suffix is confusing to me as well.
libbpf_set_memlock_rlim() would be more accurate.

The other part of the patch:
+               /* assume memcg accounting is supported if we get -ENOTSUPP */
+               err = errno == 524 /* ENOTSUPP */ ? 1 : 0;

is dangerous.
ENOTSUPP is not a standard error code.
The kernel returns it from a few places, but we might fix
all of them eventually and return EOPNOTSUPP everywhere.
In other words user space cannot rely on this 524 error code number.