Re: [PATCH bpf] samples/bpf: Set rlimit for memlock to infinity in all samples

Roman Gushchin <guro@xxxxxx> · Tue, 27 Oct 2020 10:00:35 -0700

On Tue, Oct 27, 2020 at 08:14:40AM +0100, Jesper Dangaard Brouer wrote:
> On Tue, 27 Oct 2020 00:36:23 +0100
> Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> 
> > The memlock rlimit is a notorious source of failure for BPF programs. Most
> > of the samples just set it to infinity, but a few used a lower limit. The
> > problem with unconditionally setting a lower limit is that this will also
> > override the limit if the system-wide setting is *higher* than the limit
> > being set, which can lead to failures on systems that lock a lot of memory,
> > but set 'ulimit -l' to unlimited before running a sample.
> > 
> > One fix for this is to only conditionally set the limit if the current
> > limit is lower, but it is simpler to just unify all the samples and have
> > them all set the limit to infinity.
> > 
> > Signed-off-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
> 
> This change basically disable the memlock rlimit system. And this
> disable method is becoming standard in more and more BPF programs.
> IMHO using the system-wide memlock rlimit doesn't make sense for BPF.

Hi Jesper,

+1

> 
> I'm still ACKing the patch, as this seems the only way forward, to
> ignore and in-practice not use the memlock rlimit.
> 
> Acked-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> 
> 
> I saw some patches on the list (from Facebook) with a new system for
> policy limiting memory usage per BPF program or was it mem-cgroup, but
> I don't think that was ever merged... I would really like to see
> something replace (and remove) this memlock rlimit dependency. Anyone
> knows what happened to that effort?

I'm working on it.

It required some heavy changes on the mm side: accounting of the percpu memory,
which required a framework for accounting of arbitrary non page-sized objects,
support of accounting from an interrupt context and some manipulations with
page flags in order to allow accounted vmallocs to be mapped to userspace.

It's mostly done with the last part expected to reach linux-next in few days.
Then I'll rebase and repost the bpf part.

Thanks!