On Tue, Jan 14, 2025 at 2:31 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 13-01-25 18:19:17, Alexei Starovoitov wrote: > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > Tracing BPF programs execute from tracepoints and kprobes where > > running context is unknown, but they need to request additional > > memory. The prior workarounds were using pre-allocated memory and > > BPF specific freelists to satisfy such allocation requests. > > Instead, introduce gfpflags_allow_spinning() condition that signals > > to the allocator that running context is unknown. > > Then rely on percpu free list of pages to allocate a page. > > The rmqueue_pcplist() should be able to pop the page from. > > If it fails (due to IRQ re-entrancy or list being empty) then > > try_alloc_pages() attempts to spin_trylock zone->lock > > and refill percpu freelist as normal. > > BPF program may execute with IRQs disabled and zone->lock is > > sleeping in RT, so trylock is the only option. In theory we can > > introduce percpu reentrance counter and increment it every time > > spin_lock_irqsave(&zone->lock, flags) is used, but we cannot rely > > on it. Even if this cpu is not in page_alloc path the > > spin_lock_irqsave() is not safe, since BPF prog might be called > > from tracepoint where preemption is disabled. So trylock only. > > > > Note, free_page and memcg are not taught about gfpflags_allow_spinning() > > condition. The support comes in the next patches. > > > > This is a first step towards supporting BPF requirements in SLUB > > and getting rid of bpf_mem_alloc. > > That goal was discussed at LSFMM: https://lwn.net/Articles/974138/ > > > > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> > > LGTM, I am not entirely clear on kmsan_alloc_page part though. Which part is still confusing? I hoped the comment below is enough: * Specify __GFP_ZERO to make sure that call to kmsan_alloc_page() below * is safe in any context. Also zeroing the page is mandatory for * BPF use cases. and once you zoom into: void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags) { bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled; ... if (initialized) { __memset(page_address(shadow), 0, PAGE_SIZE * pages); __memset(page_address(origin), 0, PAGE_SIZE * pages); return; } So it's safe to call it and it's necessary to call it when KMSAN is on. This was the easiest code path to analyze from doesnt-take-spinlocks pov. I feel the comment is enough. If/when people want to support !__GFP_ZERO case with KMSAN they would need to make stack_depot_save() behave in !gfpflags_allow_spinning() condition. Since __GFP_ZERO is necessary for the BPF use case I left all the extra work for the future follow ups. > As long as that part is correct you can add > Acked-by: Michal Hocko <mhocko@xxxxxxxx> > > Other than that try_alloc_pages_noprof begs some user documentation. > > /** > * try_alloc_pages_noprof - opportunistic reentrant allocation from any context > * @nid - node to allocate from > * @order - allocation order size > * > * Allocates pages of a given order from the given node. This is safe to > * call from any context (from atomic, NMI but also reentrant > * allocator -> tracepoint -> try_alloc_pages_noprof). > * Allocation is best effort and to be expected to fail easily so nobody should > * rely on the succeess. Failures are not reported via warn_alloc(). > * > * Return: allocated page or NULL on failure. > */ Nicely worded. Will add. Thanks for all the reviews. Appreciate it!