On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > > 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@xxxxxxxxxx> > > On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@xxxxxxxxxxxxx> wrote: > >> > >> 2022-06-10 09:07 UTC-0700 ~ sdf@xxxxxxxxxx > >>> On 06/10, Quentin Monnet wrote: > >>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8. > >>> > >>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of > >>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the > >>>> kernel has switched to memcg-based memory accounting. Thanks to the > >>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility > >>>> with other systems and ask libbpf to raise the limit for us if > >>>> necessary. > >>> > >>>> How do we know if memcg-based accounting is supported? There is a probe > >>>> in libbpf to check this. But this probe currently relies on the > >>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which > >>>> landed in the same kernel version as the memory accounting change. This > >>>> works in the generic case, but it may fail, for example, if the helper > >>>> function has been backported to an older kernel. This has been observed > >>>> for Google Cloud's Container-Optimized OS (COS), where the helper is > >>>> available but rlimit is still in use. The probe succeeds, the rlimit is > >>>> not raised, and probing features with bpftool, for example, fails. > >>> > >>>> A patch was submitted [0] to update this probe in libbpf, based on what > >>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to > >>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce > >>>> some hard-to-debug flakiness if another process starts, or the current > >>>> application is killed, while the rlimit is reduced, and the approach was > >>>> discarded. > >>> > >>>> As a workaround to ensure that the rlimit bump does not depend on the > >>>> availability of a given helper, we restore the unconditional rlimit bump > >>>> in bpftool for now. > >>> > >>>> [0] > >>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@xxxxxxxxxxxxx/ > >>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39 > >>> > >>>> Cc: Yafang Shao <laoar.shao@xxxxxxxxx> > >>>> Signed-off-by: Quentin Monnet <quentin@xxxxxxxxxxxxx> > >>>> --- > >>>> tools/bpf/bpftool/common.c | 8 ++++++++ > >>>> tools/bpf/bpftool/feature.c | 2 ++ > >>>> tools/bpf/bpftool/main.c | 6 +++--- > >>>> tools/bpf/bpftool/main.h | 2 ++ > >>>> tools/bpf/bpftool/map.c | 2 ++ > >>>> tools/bpf/bpftool/pids.c | 1 + > >>>> tools/bpf/bpftool/prog.c | 3 +++ > >>>> tools/bpf/bpftool/struct_ops.c | 2 ++ > >>>> 8 files changed, 23 insertions(+), 3 deletions(-) > >>> > >>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c > >>>> index a45b42ee8ab0..a0d4acd7c54a 100644 > >>>> --- a/tools/bpf/bpftool/common.c > >>>> +++ b/tools/bpf/bpftool/common.c > >>>> @@ -17,6 +17,7 @@ > >>>> #include <linux/magic.h> > >>>> #include <net/if.h> > >>>> #include <sys/mount.h> > >>>> +#include <sys/resource.h> > >>>> #include <sys/stat.h> > >>>> #include <sys/vfs.h> > >>> > >>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path) > >>>> return (unsigned long)st_fs.f_type == BPF_FS_MAGIC; > >>>> } > >>> > >>>> +void set_max_rlimit(void) > >>>> +{ > >>>> + struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY }; > >>>> + > >>>> + setrlimit(RLIMIT_MEMLOCK, &rinf); > >>> > >>> Do you think it might make sense to print to stderr some warning if > >>> we actually happen to adjust this limit? > >>> > >>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) { > >>> fprintf(stderr, "Warning: resetting MEMLOCK rlimit to > >>> infinity!\n"); > >>> setrlimit(RLIMIT_MEMLOCK, &rinf); > >>> } > >>> > >>> ? > >>> > >>> Because while it's nice that we automatically do this, this might still > >>> lead to surprises for some users. OTOH, not sure whether people > >>> actually read those warnings? :-/ > >> > >> I'm not strictly opposed to a warning, but I'm not completely sure this > >> is desirable. > >> > >> Bpftool has raised the rlimit for a long time, it changed only in April, > >> so I don't think it would come up as a surprise for people who have used > >> it for a while. I think this is also something that several other > >> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind) > >> have been doing too. > > > > In this case ignore me and let's continue doing that :-) > > > > Btw, eventually we'd still like to stop doing that I'd presume? > > Agreed. I was thinking either finding a way to improve the probe in > libbpf, or waiting for some more time until 5.11 gets old, but this may > take years :/ > > > Should > > we at some point follow up with something like: > > > > if (kernel_version >= 5.11) { don't touch memlock; } > > > > ? > > > > I guess we care only about <5.11 because of the backports, but 5.11+ > > kernels are guaranteed to have memcg. > > You mean from uname() and parsing the release? Yes I suppose we could do > that, can do as a follow-up. Yeah, uname-based, I don't think we can do better? Given that probing is problematic as well :-( But idk, up to you. > > I'm not sure whether memlock is used out there in the distros (and > > especially for root/bpf_capable), so I'm also not sure whether we > > really care or not. > > Not sure either. For what it's worth, I've never seen complaints so far > from users about the rlimit being raised (from bpftool or other BPF apps).