On Fri, Dec 10, 2021 at 3:08 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii@xxxxxxxxxx> writes: > > > The need to increase RLIMIT_MEMLOCK to do anything useful with BPF is > > one of the first extremely frustrating gotchas that all new BPF users go > > through and in some cases have to learn it a very hard way. > > > > Luckily, starting with upstream Linux kernel version 5.11, BPF subsystem > > dropped the dependency on memlock and uses memcg-based memory accounting > > instead. Unfortunately, detecting memcg-based BPF memory accounting is > > far from trivial (as can be evidenced by this patch), so in practice > > most BPF applications still do unconditional RLIMIT_MEMLOCK increase. > > > > As we move towards libbpf 1.0, it would be good to allow users to forget > > about RLIMIT_MEMLOCK vs memcg and let libbpf do the sensible adjustment > > automatically. This patch paves the way forward in this matter. Libbpf > > will do feature detection of memcg-based accounting, and if detected, > > will do nothing. But if the kernel is too old, just like BCC, libbpf > > will automatically increase RLIMIT_MEMLOCK on behalf of user > > application ([0]). > > > > As this is technically a breaking change, during the transition period > > applications have to opt into libbpf 1.0 mode by setting > > LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK bit when calling > > libbpf_set_strict_mode(). > > > > Libbpf allows to control the exact amount of set RLIMIT_MEMLOCK limit > > with libbpf_set_memlock_rlim_max() API. Passing 0 will make libbpf do > > nothing with RLIMIT_MEMLOCK. libbpf_set_memlock_rlim_max() has to be > > called before the first bpf_prog_load(), bpf_btf_load(), or > > bpf_object__load() call, otherwise it has no effect and will return > > -EBUSY. > > > > [0] Closes: https://github.com/libbpf/libbpf/issues/369 > > > > Signed-off-by: Andrii Nakryiko <andrii@xxxxxxxxxx> > > --- > > tools/lib/bpf/bpf.c | 122 ++++++++++++++++++++++++++++++++ > > tools/lib/bpf/bpf.h | 2 + > > tools/lib/bpf/libbpf.c | 47 +++--------- > > tools/lib/bpf/libbpf.map | 1 + > > tools/lib/bpf/libbpf_internal.h | 39 ++++++++++ > > tools/lib/bpf/libbpf_legacy.h | 12 +++- > > 6 files changed, 184 insertions(+), 39 deletions(-) > > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 4e7836e1a7b5..5b14111b80dd 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -29,6 +29,7 @@ > > #include <errno.h> > > #include <linux/bpf.h> > > #include <limits.h> > > +#include <sys/resource.h> > > #include "bpf.h" > > #include "libbpf.h" > > #include "libbpf_internal.h" > > @@ -94,6 +95,119 @@ static inline int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int > > return fd; > > } > > > > +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to > > + * memcg-based memory accounting for BPF maps and progs. This was done in [0], > > + * but it's not straightforward to detect those changes from the user-space. > > + * So instead we'll try to detect whether [1] is in the kernel, which follows > > + * [0] almost immediately and made it into the upstream kernel in the same > > + * release. > > + * > > + * For this, we'll upload a trivial BTF into the kernel and will try to load > > + * a trivial BPF program with attach_btf_obj_fd pointing to our BTF. If it > > + * returns anything but -EOPNOTSUPP, we'll assume we still need > > + * RLIMIT_MEMLOCK. > > + * > > + * [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@xxxxxx/ > > + * [1] Fixes: 8bdd8e275ede ("bpf: Return -ENOTSUPP when attaching to non-kernel BTF") > > + */ > > +int probe_memcg_account(void) > > +{ > > + const size_t bpf_load_attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd); > > + const size_t btf_load_attr_sz = offsetofend(union bpf_attr, btf_log_level); > > + int prog_fd = -1, btf_fd = -1, err; > > + struct bpf_insn insns[1] = {}; /* instructions don't matter */ > > + const void *btf_data; > > + union bpf_attr attr; > > + __u32 btf_data_sz; > > + struct btf *btf; > > + > > + /* create the simplest BTF object and upload it into the kernel */ > > + btf = btf__new_empty(); > > + err = libbpf_get_error(btf); > > + if (err) > > + return err; > > + err = btf__add_int(btf, "int", 4, 0); > > + btf_data = btf__raw_data(btf, &btf_data_sz); > > + if (!btf_data) { > > + err = -ENOMEM; > > + goto cleanup; > > + } > > + > > + /* we won't use bpf_btf_load() or bpf_prog_load() because they will > > + * be trying to detect this feature and will cause an infinite loop > > + */ > > + memset(&attr, 0, btf_load_attr_sz); > > + attr.btf = ptr_to_u64(btf_data); > > + attr.btf_size = btf_data_sz; > > + btf_fd = sys_bpf_fd(BPF_BTF_LOAD, &attr, btf_load_attr_sz); > > + if (btf_fd < 0) { > > + err = -errno; > > + goto cleanup; > > + } > > + > > + /* attempt loading freplace trying to use custom BTF */ > > + memset(&attr, 0, bpf_load_attr_sz); > > + attr.prog_type = BPF_PROG_TYPE_TRACING; > > + attr.expected_attach_type = BPF_TRACE_FENTRY; > > + attr.insns = ptr_to_u64(insns); > > + attr.insn_cnt = sizeof(insns) / sizeof(insns[0]); > > + attr.license = ptr_to_u64("GPL"); > > + attr.attach_btf_obj_fd = btf_fd; > > + > > + prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, bpf_load_attr_sz); > > + if (prog_fd >= 0) > > + /* bug? we shouldn't be able to successfully load program */ > > + err = -EFAULT; > > + else > > + /* assume memcg accounting is supported if we get -EOPNOTSUPP */ > > + err = errno == 524 /* ENOTSUPP */ ? 1 : 0; > > Nit: comment and code seems to be in disagreement over the name of the > error code here. One careful reader there :) There is also a comment about bpf_link_create, which I removed because BPF_LINK_CREATE didn't depend on memlock. I'll update both and will send v2, thanks. > > (side note, since we've ended up using ENOTSUPP everywhere in BPF, any > reason we can't export it in a UAPI header somewhere?) No opinion here, tbh, but certainly not part of this change. > > -Toke >