On Tue, Jan 14, 2025 at 11:19:41AM +0100, Michal Hocko wrote: > On Tue 14-01-25 10:53:55, Peter Zijlstra wrote: > > On Mon, Jan 13, 2025 at 06:19:17PM -0800, Alexei Starovoitov wrote: > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > Tracing BPF programs execute from tracepoints and kprobes where > > > running context is unknown, but they need to request additional > > > memory. > > > > > The prior workarounds were using pre-allocated memory and > > > BPF specific freelists to satisfy such allocation requests. > > > Instead, introduce gfpflags_allow_spinning() condition that signals > > > to the allocator that running context is unknown. > > > Then rely on percpu free list of pages to allocate a page. > > > The rmqueue_pcplist() should be able to pop the page from. > > > If it fails (due to IRQ re-entrancy or list being empty) then > > > try_alloc_pages() attempts to spin_trylock zone->lock > > > and refill percpu freelist as normal. > > > > > BPF program may execute with IRQs disabled and zone->lock is > > > sleeping in RT, so trylock is the only option. > > > > how is spin_trylock() from IRQ context not utterly broken in RT? > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (in_nmi() || in_hardirq())) > + return NULL; > > Deals with that, right? Changelog didn't really mention that, did it? -- it seems to imply quite the opposite :/ But maybe, I suppose any BPF program needs to expect failure due to this being trylock. I just worry some programs will malfunction due to never succeeding -- and RT getting blamed for this. Maybe I worry too much.