Re: [PATCH bpf-next v9 2/6] mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation

Vlastimil Babka <vbabka@xxxxxxx> · Wed, 12 Mar 2025 11:00:20 +0100

On 3/11/25 14:32, Alexei Starovoitov wrote:
> On Tue, Mar 11, 2025 at 3:04 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> On Fri, 21 Feb 2025 18:44:23 -0800 Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>>
>> > Tracing BPF programs execute from tracepoints and kprobes where
>> > running context is unknown, but they need to request additional
>> > memory. The prior workarounds were using pre-allocated memory and
>> > BPF specific freelists to satisfy such allocation requests.
>>
>> The "prior workarounds" sound entirely appropriate.  Because the
>> performance and maintainability of Linux's page allocator is about
>> 1,000,040 times more important than relieving BPF of having to carry a
>> "workaround".
> 
> Please explain where performance and maintainability is affected?
> 
> As far as motivation, if I recall correctly, you were present in
> the room when Vlastimil presented the next steps for SLUB at
> LSFMM back in May of last year.
> A link to memory refresher is in the commit log:
> https://lwn.net/Articles/974138/
> 
> Back then he talked about a bunch of reasons including better
> maintainability of the kernel overall, but what stood out to me
> as the main reason to use SLUB for bpf, objpool, mempool,
> and networking needs is prevention of memory waste.
> All these wrappers of slub pin memory that should be shared.
> bpf, objpool, mempools should be good citizens of the kernel
> instead of stealing the memory. That's the core job of the
> kernel. To share resources. Memory is one such resource.

Yes. Although at that time I've envisioned there would still be some
reserved objects set aside for these purposes. The difference would be they
would be under control of the allocator and not in multiple caches outside
of it.

But if we can achieve the same without such reserved objects, I think it's
even better. Performance and maintainability doesn't need to necessarily
suffer. Maybe it can even improve in the process. E.g. if we build upon
patches 1+4 and swith memcg stock locking to the non-irqsave variant, we
should avoid some overhead there (something similar was tried there in the
past but reverted when making it RT compatible).