Re: [PATCH bpf-next 0/5] bpf: BPF specific memory allocator.

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 19 Jul 2022 13:52:35 +0200



On 7/6/22 19:43, Alexei Starovoitov wrote:
> On Mon, Jul 04, 2022 at 06:13:17PM +0200, Vlastimil Babka wrote:
>> 
>> > On RT fast path == slow path with a lock.
>> > On !RT fast path is lock less.
>> > That's all correct.
>> > bpf side has to make sure safety in all possible paths
>> > therefore RT or !RT makes no difference.
>> 
>> So AFAIK we don't right now have what BFP needs - an extra-constrained kind
>> of GFP_ATOMIC. I don't object you adding it privately. But it's another
>> reason to think about if these things can be generalized. For example we had
>> a discussion about the Maple tree having kinda similar kinds of requirements
>> to avoid its tree node preallocations always for the worst possible case.
> 
> What kind of maple tree needs? Does it need to be fully reentrant and nmi safe?
> Not really. The caller knows the context and can choose appropriate flags.
> While bpf alloc doesn't know the context. The bpf prog can be called from
> places where slab/page/kasan specific locks are held which makes all these
> pieces non-reentrable.

Sure, the context restrictions can differ between bpf, maple tree and other
users, but I think there's common need not to be dependend on slab/page
allocator implementation internals and its locking. So the common
allocator/cache on top would need to be implemented in a way to support the
most restricted context (e.g. bpf), thus be lockless and whatnot.
But then the individual users would be able to specify different details such as
- how much to preallocate in order to not run out of the cache
- what is allowed if we run out of cache - only async refill (bpf?) or also
e.g. GFP_NOWAIT for less restricted users?

> The full prealloc of bpf maps (read: waste a lot of memory) was our solution until now.
> This is specific to tracing bpf programs, of course.
> bpf networking, bpf security, sleepable bpf are completely different.
> 
>> I'm not sure we can sanely implement this within each of SLAB/SLUB/SLOB, or
>> rather provide a generic cache on top...
> 
> Notice that all of bpf cache functions are notrace/nokprobe/no locks.
> The main difference vs all other allocators is bpf_mem_alloc from cache
> and refill of the cache are two asynchronous operations. It allows the former
> to be reentrant and nmi safe.
> All in tree allocators sooner or later synchornously call into page_alloc,
> kasan, memleak and other debugging facilites that grab locks.
>