On Thu, Jun 9, 2022 at 11:13 AM <sdf@xxxxxxxxxx> wrote: > > On 06/09, Quentin Monnet wrote: > > To ensure that memory accounting will not hinder the load of BPF > > objects, libbpf may raise the memlock rlimit before proceeding to some > > operations. Whether this limit needs to be raised depends on the version > > of the kernel: newer versions use cgroup-based (memcg) memory > > accounting, and do not require any adjustment. > > > There is a probe in libbpf to determine whether memcg-based accounting > > is supported. But this probe currently relies on the availability of a > > given BPF helper, bpf_ktime_get_coarse_ns(), which landed in the same > > kernel version as the memory accounting change. This works in the > > generic case, but it may fail, for example, if the helper function has > > been backported to an older kernel. This has been observed for Google > > Cloud's Container-Optimized OS (COS), where the helper is available but > > rlimit is still in use. The probe succeeds, the rlimit is not raised, > > and probing features with bpftool, for example, fails. > > > Here we attempt to improve this probe and to effectively rely on memory > > accounting. Function probe_memcg_account() in libbpf is updated to set > > the rlimit to 0, then attempt to load a BPF object, and then to reset > > the rlimit. If the load still succeeds, then this means we're running > > with memcg-based accounting. > > > This probe was inspired by the similar one from the cilium/ebpf Go > > library [0]. > > > [0] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 > > > Signed-off-by: Quentin Monnet <quentin@xxxxxxxxxxxxx> > > --- > > tools/lib/bpf/bpf.c | 23 ++++++++++++++++++----- > > 1 file changed, 18 insertions(+), 5 deletions(-) > > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > > index 240186aac8e6..781387e6f66b 100644 > > --- a/tools/lib/bpf/bpf.c > > +++ b/tools/lib/bpf/bpf.c > > @@ -99,31 +99,44 @@ static inline int sys_bpf_prog_load(union bpf_attr > > *attr, unsigned int size, int > > > /* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to > > * memcg-based memory accounting for BPF maps and progs. This was done > > in [0]. > > - * We use the support for bpf_ktime_get_coarse_ns() helper, which was > > added in > > - * the same 5.11 Linux release ([1]), to detect memcg-based accounting > > for BPF. > > + * To do so, we lower the soft memlock rlimit to 0 and attempt to create > > a BPF > > + * object. If it succeeds, then memcg-based accounting for BPF is > > available. > > * > > * [0] > > https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@xxxxxx/ > > - * [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper") > > */ > > int probe_memcg_account(void) > > { > > const size_t prog_load_attr_sz = offsetofend(union bpf_attr, > > attach_btf_obj_fd); > > struct bpf_insn insns[] = { > > - BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns), > > BPF_EXIT_INSN(), > > }; > > + struct rlimit rlim_init, rlim_cur_zero = {}; > > size_t insn_cnt = ARRAY_SIZE(insns); > > union bpf_attr attr; > > int prog_fd; > > > - /* attempt loading freplace trying to use custom BTF */ > > memset(&attr, 0, prog_load_attr_sz); > > attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; > > attr.insns = ptr_to_u64(insns); > > attr.insn_cnt = insn_cnt; > > attr.license = ptr_to_u64("GPL"); > > > + if (getrlimit(RLIMIT_MEMLOCK, &rlim_init)) > > + return -1; > > + > > + /* Drop the soft limit to zero. We maintain the hard limit to its > > + * current value, because lowering it would be a permanent operation > > + * for unprivileged users. > > + */ > > + rlim_cur_zero.rlim_max = rlim_init.rlim_max; > > + if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero)) > > + return -1; > > + > > prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, prog_load_attr_sz); > > + > > + /* reset soft rlimit as soon as possible */ > > + setrlimit(RLIMIT_MEMLOCK, &rlim_init); > > Isn't that adding more flakiness to the other daemons running as > the same user? Also, there might be surprises if another daemon that > has libbpf in it starts right when we've set the limit temporarily to zero. > I agree, it briefly changes global process state and can introduce some undesirable (and very non-obvious) side effects. > Can we push these decisions to the users as part of libbpf 1.0 cleanup? Quentin, at least for bpftool, I think it's totally fine to just always bump RLIMIT_MEMLOCK to avoid this issue. That would solve the issue with probing, right? And for end applications I think I agree with Stanislav that application might need to ensure rlimit bumping for such backported changes. > > > + > > if (prog_fd >= 0) { > > close(prog_fd); > > return 1; > > -- > > 2.34.1 >