On 12/10/24 03:39, Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Tracing BPF programs execute from tracepoints and kprobes where running > context is unknown, but they need to request additional memory. > The prior workarounds were using pre-allocated memory and BPF specific > freelists to satisfy such allocation requests. Instead, introduce > __GFP_TRYLOCK flag that makes page allocator accessible from any context. > It relies on percpu free list of pages that rmqueue_pcplist() should be > able to pop the page from. If it fails (due to IRQ re-entrancy or list > being empty) then try_alloc_pages() attempts to spin_trylock zone->lock > and refill percpu freelist as normal. > BPF program may execute with IRQs disabled and zone->lock is sleeping in RT, > so trylock is the only option. > In theory we can introduce percpu reentrance counter and increment it > every time spin_lock_irqsave(&zone->lock, flags) is used, > but we cannot rely on it. Even if this cpu is not in page_alloc path > the spin_lock_irqsave() is not safe, since BPF prog might be called > from tracepoint where preemption is disabled. So trylock only. > > Note, free_page and memcg are not taught about __GFP_TRYLOCK yet. > The support comes in the next patches. > > This is a first step towards supporting BPF requirements in SLUB > and getting rid of bpf_mem_alloc. > That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> I think there might be more non-try spin_locks reachable from page allocations: - in reserve_highatomic_pageblock() which I think is reachable unless this is limited to order-0 - try_to_accept_memory_one() - as part of post_alloc_hook() in set_page_owner(), stack depot might do raw_spin_lock_irqsave(), is that one ok? hope I didn't miss anything else especially in those other debugging hooks (KASAN etc)