On Wed, Dec 11, 2024 at 12:39 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 12/10/24 22:53, Alexei Starovoitov wrote: > > On Tue, Dec 10, 2024 at 1:01 AM Sebastian Andrzej Siewior > > <bigeasy@xxxxxxxxxxxxx> wrote: > >> > >> On 2024-12-09 18:39:31 [-0800], Alexei Starovoitov wrote: > >> > From: Alexei Starovoitov <ast@xxxxxxxxxx> > >> > > >> > Tracing BPF programs execute from tracepoints and kprobes where running > >> > context is unknown, but they need to request additional memory. > >> > The prior workarounds were using pre-allocated memory and BPF specific > >> > freelists to satisfy such allocation requests. Instead, introduce > >> > __GFP_TRYLOCK flag that makes page allocator accessible from any context. > >> > It relies on percpu free list of pages that rmqueue_pcplist() should be > >> > able to pop the page from. If it fails (due to IRQ re-entrancy or list > >> > being empty) then try_alloc_pages() attempts to spin_trylock zone->lock > >> > and refill percpu freelist as normal. > >> > BPF program may execute with IRQs disabled and zone->lock is sleeping in RT, > >> > so trylock is the only option. > >> > >> The __GFP_TRYLOCK flag looks reasonable given the challenges for BPF > >> where it is not known how much memory will be needed and what the > >> calling context is. > > > > Exactly. > > > >> I hope it does not spread across the kernel where > >> people do ATOMIC in preempt/ IRQ-off on PREEMPT_RT and then once they > >> learn that this does not work, add this flag to the mix to make it work > >> without spending some time on reworking it. > > > > We can call it __GFP_BPF to discourage any other usage, > > but that seems like an odd "solution" to code review problem. > > Could we perhaps not expose the flag to public headers at all, and keep it > only as an internal detail of try_alloc_pages_noprof()? public headers? To pass additional bit via gfp flags into alloc_pages gfp_types.h has to be touched. If you mean moving try_alloc_pages() into mm/page_alloc.c and adding another argument to __alloc_pages_noprof then it's not pretty. It has 'gfp_t gfp' argument. It should to be used to pass the intent. We don't have to add GFP_TRYLOCK at all if we go with memalloc_nolock_save() approach. So I started looking at it, but immediately hit trouble with bits. There are 5 bits left in PF_ and 3 already used for mm needs. That doesn't look sustainable long term. How about we alias nolock concept with PF_MEMALLOC_PIN ? As far as I could trace PF_MEMALLOC_PIN clears GFP_MOVABLE and nothing else. The same bit plus lack of __GFP_KSWAPD_RECLAIM in gfp flags would mean nolock mode in alloc_pages, while PF_MEMALLOC_PIN alone would mean nolock in free_pages and deeper inside memcg paths and such. thoughts? too hacky?