On Sat, Dec 11, 2021 at 11:36 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii@xxxxxxxxxx> writes: > > > The need to increase RLIMIT_MEMLOCK to do anything useful with BPF is > > one of the first extremely frustrating gotchas that all new BPF users go > > through and in some cases have to learn it a very hard way. > > > > Luckily, starting with upstream Linux kernel version 5.11, BPF subsystem > > dropped the dependency on memlock and uses memcg-based memory accounting > > instead. Unfortunately, detecting memcg-based BPF memory accounting is > > far from trivial (as can be evidenced by this patch), so in practice > > most BPF applications still do unconditional RLIMIT_MEMLOCK increase. > > > > As we move towards libbpf 1.0, it would be good to allow users to forget > > about RLIMIT_MEMLOCK vs memcg and let libbpf do the sensible adjustment > > automatically. This patch paves the way forward in this matter. Libbpf > > will do feature detection of memcg-based accounting, and if detected, > > will do nothing. But if the kernel is too old, just like BCC, libbpf > > will automatically increase RLIMIT_MEMLOCK on behalf of user > > application ([0]). > > > > As this is technically a breaking change, during the transition period > > applications have to opt into libbpf 1.0 mode by setting > > LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK bit when calling > > libbpf_set_strict_mode(). > > > > Libbpf allows to control the exact amount of set RLIMIT_MEMLOCK limit > > with libbpf_set_memlock_rlim_max() API. Passing 0 will make libbpf do > > nothing with RLIMIT_MEMLOCK. libbpf_set_memlock_rlim_max() has to be > > called before the first bpf_prog_load(), bpf_btf_load(), or > > bpf_object__load() call, otherwise it has no effect and will return > > -EBUSY. > > > > [0] Closes: https://github.com/libbpf/libbpf/issues/369 > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > The probing approach breaks with out-of-order backports, I suppose. > Hopefully no one will do those for that particular patch, though (it's > not really a bugfix), and at least for RHEL we did backport them > together. > > Can't think of any better ways of doing the detection either, but maybe > something to be aware of in the future (i.e., "don't change things in a > way that can't be detected from userspace")? > > Anyway, with the nits below: > > Acked-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 6b2407e12060..7c82136979bf 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > [...] > > > + /* attempt loading freplace trying to use custom BTF */ > > + memset(&attr, 0, bpf_load_attr_sz); > > + attr.prog_type = BPF_PROG_TYPE_TRACING; > > + attr.expected_attach_type = BPF_TRACE_FENTRY; > > This comment also seems to be disagreeing with the code it's commenting > on? > > [...] > > > +static bool memlock_bumped; > > +static rlim_t memlock_rlim_max = RLIM_INFINITY; > > + > > +int libbpf_set_memlock_rlim_max(size_t memlock_max) > > +{ > > + if (memlock_bumped) > > + return libbpf_err(-EBUSY); > > + > > + memlock_rlim_max = memlock_max; > > + return 0; > > +} > > + > > +int bump_rlimit_memlock(void) > > +{ > > + struct rlimit rlim; > > + > > + /* this the default in libbpf 1.0, but for now user has to opt-in explicitly */ > > + if (!(libbpf_mode & LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK)) > > + return 0; > > + > > + /* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */ > > + if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT)) > > + return 0; > > + > > + memlock_bumped = true; > > + > > + /* zero memlock_rlim_max disables auto-bumping RLIMIT_MEMLOCK */ > > + if (memlock_rlim_max == 0) > > + return 0; > > + > > + rlim.rlim_cur = rlim.rlim_max = memlock_rlim_max; > > + if (setrlimit(RLIMIT_MEMLOCK, &rlim)) > > + return -errno; > > + > > + return 0; > > +} > > + > > "rlim_max" seems to imply this will only ever increase the limit, but if > I'm reading the code correctly it could actually end up lowering the > effective limit? _max suffix is confusing to me as well. libbpf_set_memlock_rlim() would be more accurate. The other part of the patch: + /* assume memcg accounting is supported if we get -ENOTSUPP */ + err = errno == 524 /* ENOTSUPP */ ? 1 : 0; is dangerous. ENOTSUPP is not a standard error code. The kernel returns it from a few places, but we might fix all of them eventually and return EOPNOTSUPP everywhere. In other words user space cannot rely on this 524 error code number.