On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > > 2022-06-10 09:07 UTC-0700 ~ sdf@xxxxxxxxxx > > On 06/10, Quentin Monnet wrote: > >> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8. > > > >> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of > >> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the > >> kernel has switched to memcg-based memory accounting. Thanks to the > >> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility > >> with other systems and ask libbpf to raise the limit for us if > >> necessary. > > > >> How do we know if memcg-based accounting is supported? There is a probe > >> in libbpf to check this. But this probe currently relies on the > >> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which > >> landed in the same kernel version as the memory accounting change. This > >> works in the generic case, but it may fail, for example, if the helper > >> function has been backported to an older kernel. This has been observed > >> for Google Cloud's Container-Optimized OS (COS), where the helper is > >> available but rlimit is still in use. The probe succeeds, the rlimit is > >> not raised, and probing features with bpftool, for example, fails. > > > >> A patch was submitted [0] to update this probe in libbpf, based on what > >> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to > >> 0, attempt to load a BPF object, and reset the rlimit. But it may induce > >> some hard-to-debug flakiness if another process starts, or the current > >> application is killed, while the rlimit is reduced, and the approach was > >> discarded. > > > >> As a workaround to ensure that the rlimit bump does not depend on the > >> availability of a given helper, we restore the unconditional rlimit bump > >> in bpftool for now. > > > >> [0] > >> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@xxxxxxxxxxxxx/ > >> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 > > > >> Cc: Yafang Shao <laoar.shao@xxxxxxxxx> > >> Signed-off-by: Quentin Monnet <quentin@xxxxxxxxxxxxx> > >> --- > >> tools/bpf/bpftool/common.c | 8 ++++++++ > >> tools/bpf/bpftool/feature.c | 2 ++ > >> tools/bpf/bpftool/main.c | 6 +++--- > >> tools/bpf/bpftool/main.h | 2 ++ > >> tools/bpf/bpftool/map.c | 2 ++ > >> tools/bpf/bpftool/pids.c | 1 + > >> tools/bpf/bpftool/prog.c | 3 +++ > >> tools/bpf/bpftool/struct_ops.c | 2 ++ > >> 8 files changed, 23 insertions(+), 3 deletions(-) > > > >> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c > >> index a45b42ee8ab0..a0d4acd7c54a 100644 > >> --- a/tools/bpf/bpftool/common.c > >> +++ b/tools/bpf/bpftool/common.c > >> @@ -17,6 +17,7 @@ > >> #include <linux/magic.h> > >> #include <net/if.h> > >> #include <sys/mount.h> > >> +#include <sys/resource.h> > >> #include <sys/stat.h> > >> #include <sys/vfs.h> > > > >> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path) > >> return (unsigned long)st_fs.f_type == BPF_FS_MAGIC; > >> } > > > >> +void set_max_rlimit(void) > >> +{ > >> + struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY }; > >> + > >> + setrlimit(RLIMIT_MEMLOCK, &rinf); > > > > Do you think it might make sense to print to stderr some warning if > > we actually happen to adjust this limit? > > > > if (getrlimit(MEMLOCK) != RLIM_INFINITY) { > > fprintf(stderr, "Warning: resetting MEMLOCK rlimit to > > infinity!\n"); > > setrlimit(RLIMIT_MEMLOCK, &rinf); > > } > > > > ? > > > > Because while it's nice that we automatically do this, this might still > > lead to surprises for some users. OTOH, not sure whether people > > actually read those warnings? :-/ > > I'm not strictly opposed to a warning, but I'm not completely sure this > is desirable. > > Bpftool has raised the rlimit for a long time, it changed only in April, > so I don't think it would come up as a surprise for people who have used > it for a while. I think this is also something that several other > BPF-related applications (BCC I think?, bpftrace, Cilium come to mind) > have been doing too. In this case ignore me and let's continue doing that :-) Btw, eventually we'd still like to stop doing that I'd presume? Should we at some point follow up with something like: if (kernel_version >= 5.11) { don't touch memlock; } ? I guess we care only about <5.11 because of the backports, but 5.11+ kernels are guaranteed to have memcg. I'm not sure whether memlock is used out there in the distros (and especially for root/bpf_capable), so I'm also not sure whether we really care or not. > For new users, I agree the warning may be helpful. But then the message > is likely to appear the very first time a user runs the command - likely > as root - and I fear this might worry people not familiar with rlimits, > who would wonder if they just broke something on their system? Maybe > with a different phrasing. > > Alternatively we could document it in the relevant man pages (not that > people would see it better, but at least it would be mentioned somewhere > if people take the time to read the docs)? What do you think? > > Quentin