On 2024-12-09 18:39:31 [-0800], Alexei Starovoitov wrote: > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > Tracing BPF programs execute from tracepoints and kprobes where running > context is unknown, but they need to request additional memory. > The prior workarounds were using pre-allocated memory and BPF specific > freelists to satisfy such allocation requests. Instead, introduce > __GFP_TRYLOCK flag that makes page allocator accessible from any context. > It relies on percpu free list of pages that rmqueue_pcplist() should be > able to pop the page from. If it fails (due to IRQ re-entrancy or list > being empty) then try_alloc_pages() attempts to spin_trylock zone->lock > and refill percpu freelist as normal. > BPF program may execute with IRQs disabled and zone->lock is sleeping in RT, > so trylock is the only option. The __GFP_TRYLOCK flag looks reasonable given the challenges for BPF where it is not known how much memory will be needed and what the calling context is. I hope it does not spread across the kernel where people do ATOMIC in preempt/ IRQ-off on PREEMPT_RT and then once they learn that this does not work, add this flag to the mix to make it work without spending some time on reworking it. Side note: I am in the process of hopefully getting rid of the preempt_disable() from trace points. What remains then is attaching BPF programs to any code/ function with a raw_spinlock_t and I am not yet sure what to do here. Sebastian