On 7/6/22 19:43, Alexei Starovoitov wrote: > On Mon, Jul 04, 2022 at 06:13:17PM +0200, Vlastimil Babka wrote: >> >> > On RT fast path == slow path with a lock. >> > On !RT fast path is lock less. >> > That's all correct. >> > bpf side has to make sure safety in all possible paths >> > therefore RT or !RT makes no difference. >> >> So AFAIK we don't right now have what BFP needs - an extra-constrained kind >> of GFP_ATOMIC. I don't object you adding it privately. But it's another >> reason to think about if these things can be generalized. For example we had >> a discussion about the Maple tree having kinda similar kinds of requirements >> to avoid its tree node preallocations always for the worst possible case. > > What kind of maple tree needs? Does it need to be fully reentrant and nmi safe? > Not really. The caller knows the context and can choose appropriate flags. > While bpf alloc doesn't know the context. The bpf prog can be called from > places where slab/page/kasan specific locks are held which makes all these > pieces non-reentrable. Sure, the context restrictions can differ between bpf, maple tree and other users, but I think there's common need not to be dependend on slab/page allocator implementation internals and its locking. So the common allocator/cache on top would need to be implemented in a way to support the most restricted context (e.g. bpf), thus be lockless and whatnot. But then the individual users would be able to specify different details such as - how much to preallocate in order to not run out of the cache - what is allowed if we run out of cache - only async refill (bpf?) or also e.g. GFP_NOWAIT for less restricted users? > The full prealloc of bpf maps (read: waste a lot of memory) was our solution until now. > This is specific to tracing bpf programs, of course. > bpf networking, bpf security, sleepable bpf are completely different. > >> I'm not sure we can sanely implement this within each of SLAB/SLUB/SLOB, or >> rather provide a generic cache on top... > > Notice that all of bpf cache functions are notrace/nokprobe/no locks. > The main difference vs all other allocators is bpf_mem_alloc from cache > and refill of the cache are two asynchronous operations. It allows the former > to be reentrant and nmi safe. > All in tree allocators sooner or later synchornously call into page_alloc, > kasan, memleak and other debugging facilites that grab locks. >