On Wed, Jun 29, 2022 at 7:13 PM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > > Bpftool used to bump the memlock rlimit to make sure to be able to load > BPF objects. After the kernel has switched to memcg-based memory > accounting [0] in 5.11, bpftool has relied on libbpf to probe the system > for memcg-based accounting support and for raising the rlimit if > necessary [1]. But this was later reverted, because the probe would > sometimes fail, resulting in bpftool not being able to load all required > objects [2]. > > Here we add a more efficient probe, in bpftool itself. We first lower > the rlimit to 0, then we attempt to load a BPF object (and finally reset > the rlimit): if the load succeeds, then memcg-based memory accounting is > supported. > > This approach was earlier proposed for the probe in libbpf itself [3], > but given that the library may be used in multithreaded applications, > the probe could have undesirable consequences if one thread attempts to > lock kernel memory while memlock rlimit is at 0. Since bpftool is > single-threaded and the rlimit is process-based, this is fine to do in > bpftool itself. > > This probe was inspired by the similar one from the cilium/ebpf Go > library [4]. > > v2: > - Simply use sizeof(attr) instead of hardcoding a size via > offsetofend(). > - Set r0 = 0 before returning in sample program. > > [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'") > [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK") > [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"") > [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@xxxxxxxxxxxxx/t/#u > [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 > > Cc: Stanislav Fomichev <sdf@xxxxxxxxxx> > Cc: Yafang Shao <laoar.shao@xxxxxxxxx> > Suggested-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > Signed-off-by: Quentin Monnet <quentin@xxxxxxxxxxxxx> LGTM Acked-by: Yafang Shao <laoar.shao@xxxxxxxxx> > --- > tools/bpf/bpftool/common.c | 71 ++++++++++++++++++++++++++++++++++++-- > 1 file changed, 68 insertions(+), 3 deletions(-) > > diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c > index a0d4acd7c54a..fc8172a4969a 100644 > --- a/tools/bpf/bpftool/common.c > +++ b/tools/bpf/bpftool/common.c > @@ -13,14 +13,17 @@ > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > -#include <linux/limits.h> > -#include <linux/magic.h> > #include <net/if.h> > #include <sys/mount.h> > #include <sys/resource.h> > #include <sys/stat.h> > #include <sys/vfs.h> > > +#include <linux/filter.h> > +#include <linux/limits.h> > +#include <linux/magic.h> > +#include <linux/unistd.h> > + > #include <bpf/bpf.h> > #include <bpf/hashmap.h> > #include <bpf/libbpf.h> /* libbpf_num_possible_cpus */ > @@ -73,11 +76,73 @@ static bool is_bpffs(char *path) > return (unsigned long)st_fs.f_type == BPF_FS_MAGIC; > } > > +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to > + * memcg-based memory accounting for BPF maps and programs. This was done in > + * commit 97306be45fbe ("Merge branch 'switch to memcg-based memory > + * accounting'"), in Linux 5.11. > + * > + * Libbpf also offers to probe for memcg-based accounting vs rlimit, but does > + * so by checking for the availability of a given BPF helper and this has > + * failed on some kernels with backports in the past, see commit 6b4384ff1088 > + * ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK""). > + * Instead, we can probe by lowering the process-based rlimit to 0, trying to > + * load a BPF object, and resetting the rlimit. If the load succeeds then > + * memcg-based accounting is supported. > + * > + * This would be too dangerous to do in the library, because multithreaded > + * applications might attempt to load items while the rlimit is at 0. Given > + * that bpftool is single-threaded, this is fine to do here. > + */ > +static bool known_to_need_rlimit(void) > +{ > + struct rlimit rlim_init, rlim_cur_zero = {}; > + struct bpf_insn insns[] = { > + BPF_MOV64_IMM(BPF_REG_0, 0), > + BPF_EXIT_INSN(), > + }; > + size_t insn_cnt = ARRAY_SIZE(insns); > + union bpf_attr attr; > + int prog_fd, err; > + > + memset(&attr, 0, sizeof(attr)); > + attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; > + attr.insns = ptr_to_u64(insns); > + attr.insn_cnt = insn_cnt; > + attr.license = ptr_to_u64("GPL"); > + > + if (getrlimit(RLIMIT_MEMLOCK, &rlim_init)) > + return false; > + > + /* Drop the soft limit to zero. We maintain the hard limit to its > + * current value, because lowering it would be a permanent operation > + * for unprivileged users. > + */ > + rlim_cur_zero.rlim_max = rlim_init.rlim_max; > + if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero)) > + return false; > + > + /* Do not use bpf_prog_load() from libbpf here, because it calls > + * bump_rlimit_memlock(), interfering with the current probe. > + */ > + prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr)); > + err = errno; > + > + /* reset soft rlimit to its initial value */ > + setrlimit(RLIMIT_MEMLOCK, &rlim_init); > + > + if (prog_fd < 0) > + return err == EPERM; > + > + close(prog_fd); > + return false; > +} > + > void set_max_rlimit(void) > { > struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY }; > > - setrlimit(RLIMIT_MEMLOCK, &rinf); > + if (known_to_need_rlimit()) > + setrlimit(RLIMIT_MEMLOCK, &rinf); > } > > static int > -- > 2.34.1 > -- Regards Yafang