On Mon, Jul 04, 2022 at 06:13:17PM +0200, Vlastimil Babka wrote: > On 6/29/22 04:49, Alexei Starovoitov wrote: > > On Tue, Jun 28, 2022 at 7:35 PM Christoph Lameter <cl@xxxxxxxxx> wrote: > >> > >> On Tue, 28 Jun 2022, Alexei Starovoitov wrote: > >> > >> > > That is a relatively new feature due to RT logic support. without RT this > >> > > would be a simple irq disable. > >> > > >> > Not just RT. > >> > It's a slow path: > >> > if (IS_ENABLED(CONFIG_PREEMPT_RT) || > >> > unlikely(!object || !slab || !node_match(slab, node))) { > >> > local_unlock_irqrestore(&s->cpu_slab->lock,...); > >> > and that's not the only lock in there. > >> > new_slab->allocate_slab... alloc_pages grabbing more locks. > >> > >> > >> Its not a lock for !RT. > >> > >> The fastpath is lockless if hardware allows that but then we go into more > >> and more serialiation needs as the allocation gets more into the page > >> allocator logic. > > Yeah I don't think the recent RT-related changes made this much worse than > it already was. In alloc side you could perhaps try the really lockless > fastpaths only and fail if e.g. the per-cpu slabs were empty (but would BPF > be happy with that?). On the free side though you could end up having to > move a slab from partial to free list as a result, and now a spin lock is > needed (even before the RT changes), and you can't really fail a free... > > > On RT fast path == slow path with a lock. > > On !RT fast path is lock less. > > That's all correct. > > bpf side has to make sure safety in all possible paths > > therefore RT or !RT makes no difference. > > So AFAIK we don't right now have what BFP needs - an extra-constrained kind > of GFP_ATOMIC. I don't object you adding it privately. But it's another > reason to think about if these things can be generalized. For example we had > a discussion about the Maple tree having kinda similar kinds of requirements > to avoid its tree node preallocations always for the worst possible case. What kind of maple tree needs? Does it need to be fully reentrant and nmi safe? Not really. The caller knows the context and can choose appropriate flags. While bpf alloc doesn't know the context. The bpf prog can be called from places where slab/page/kasan specific locks are held which makes all these pieces non-reentrable. The full prealloc of bpf maps (read: waste a lot of memory) was our solution until now. This is specific to tracing bpf programs, of course. bpf networking, bpf security, sleepable bpf are completely different. > I'm not sure we can sanely implement this within each of SLAB/SLUB/SLOB, or > rather provide a generic cache on top... Notice that all of bpf cache functions are notrace/nokprobe/no locks. The main difference vs all other allocators is bpf_mem_alloc from cache and refill of the cache are two asynchronous operations. It allows the former to be reentrant and nmi safe. All in tree allocators sooner or later synchornously call into page_alloc, kasan, memleak and other debugging facilites that grab locks.