Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 11, 2024 at 12:39 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 12/10/24 22:53, Alexei Starovoitov wrote:
> > On Tue, Dec 10, 2024 at 1:01 AM Sebastian Andrzej Siewior
> > <bigeasy@xxxxxxxxxxxxx> wrote:
> >>
> >> On 2024-12-09 18:39:31 [-0800], Alexei Starovoitov wrote:
> >> > From: Alexei Starovoitov <ast@xxxxxxxxxx>
> >> >
> >> > Tracing BPF programs execute from tracepoints and kprobes where running
> >> > context is unknown, but they need to request additional memory.
> >> > The prior workarounds were using pre-allocated memory and BPF specific
> >> > freelists to satisfy such allocation requests. Instead, introduce
> >> > __GFP_TRYLOCK flag that makes page allocator accessible from any context.
> >> > It relies on percpu free list of pages that rmqueue_pcplist() should be
> >> > able to pop the page from. If it fails (due to IRQ re-entrancy or list
> >> > being empty) then try_alloc_pages() attempts to spin_trylock zone->lock
> >> > and refill percpu freelist as normal.
> >> > BPF program may execute with IRQs disabled and zone->lock is sleeping in RT,
> >> > so trylock is the only option.
> >>
> >> The __GFP_TRYLOCK flag looks reasonable given the challenges for BPF
> >> where it is not known how much memory will be needed and what the
> >> calling context is.
> >
> > Exactly.
> >
> >> I hope it does not spread across the kernel where
> >> people do ATOMIC in preempt/ IRQ-off on PREEMPT_RT and then once they
> >> learn that this does not work, add this flag to the mix to make it work
> >> without spending some time on reworking it.
> >
> > We can call it __GFP_BPF to discourage any other usage,
> > but that seems like an odd "solution" to code review problem.
>
> Could we perhaps not expose the flag to public headers at all, and keep it
> only as an internal detail of try_alloc_pages_noprof()?

public headers?
To pass additional bit via gfp flags into alloc_pages
gfp_types.h has to be touched.

If you mean moving try_alloc_pages() into mm/page_alloc.c and
adding another argument to __alloc_pages_noprof then it's not pretty.
It has 'gfp_t gfp' argument. It should to be used to pass the intent.

We don't have to add GFP_TRYLOCK at all if we go with
memalloc_nolock_save() approach.
So I started looking at it,
but immediately hit trouble with bits.
There are 5 bits left in PF_ and 3 already used for mm needs.
That doesn't look sustainable long term.
How about we alias nolock concept with PF_MEMALLOC_PIN ?

As far as I could trace PF_MEMALLOC_PIN clears GFP_MOVABLE and nothing else.

The same bit plus lack of __GFP_KSWAPD_RECLAIM in gfp flags
would mean nolock mode in alloc_pages,
while PF_MEMALLOC_PIN alone would mean nolock in free_pages
and deeper inside memcg paths and such.

thoughts? too hacky?





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux