Re: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 14 Mar 2025 17:51:06 -0700

On Wed, Mar 12, 2025 at 3:00 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 3/11/25 14:32, Alexei Starovoitov wrote:
> > On Tue, Mar 11, 2025 at 3:04 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> On Fri, 21 Feb 2025 18:44:23 -0800 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> >>
> >> > Tracing BPF programs execute from tracepoints and kprobes where
> >> > running context is unknown, but they need to request additional
> >> > memory. The prior workarounds were using pre-allocated memory and
> >> > BPF specific freelists to satisfy such allocation requests.
> >>
> >> The "prior workarounds" sound entirely appropriate.  Because the
> >> performance and maintainability of Linux's page allocator is about
> >> 1,000,040 times more important than relieving BPF of having to carry a
> >> "workaround".
> >
> > Please explain where performance and maintainability is affected?
> >
> > As far as motivation, if I recall correctly, you were present in
> > the room when Vlastimil presented the next steps for SLUB at
> > LSFMM back in May of last year.
> > A link to memory refresher is in the commit log:
> > https://lwn.net/Articles/974138/
> >
> > Back then he talked about a bunch of reasons including better
> > maintainability of the kernel overall, but what stood out to me
> > as the main reason to use SLUB for bpf, objpool, mempool,
> > and networking needs is prevention of memory waste.
> > All these wrappers of slub pin memory that should be shared.
> > bpf, objpool, mempools should be good citizens of the kernel
> > instead of stealing the memory. That's the core job of the
> > kernel. To share resources. Memory is one such resource.
>
> Yes. Although at that time I've envisioned there would still be some
> reserved objects set aside for these purposes. The difference would be they
> would be under control of the allocator and not in multiple caches outside
> of it.

Yes. Exactly. So far it looks like we don't have to add a pool of
reserved objects. percpu caches are such reserve pools already.
In the worst cast it will be one global reserve shared by everyone
under mm control. shrinker may be an option too.
All that complexity looks very unlikely at this point.
I'm certainly more optimistic now than when we started on this path
back in November :)

> But if we can achieve the same without such reserved objects, I think it's
> even better. Performance and maintainability doesn't need to necessarily
> suffer. Maybe it can even improve in the process. E.g. if we build upon
> patches 1+4 and swith memcg stock locking to the non-irqsave variant, we
> should avoid some overhead there (something similar was tried there in the
> past but reverted when making it RT compatible).

Sounds like Shakeel is starting to experiment in this area which is great.
Performance improvements in memcg are certainly very welcomed.