Re: [PATCH bpf-next 1/2] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 16, 2024 at 11:42 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Nov 15, 2024 at 05:48:53PM -0800, Alexei Starovoitov wrote:
> > +static inline struct page *try_alloc_page_noprof(int nid)
> > +{
> > +     /* If spin_locks are not held and interrupts are enabled, use normal path. */
> > +     if (preemptible())
> > +             return alloc_pages_node_noprof(nid, GFP_NOWAIT | __GFP_ZERO, 0);
>
> This isn't right for PREEMPT_RT, spinlock_t will be preemptible, but you
> very much do not want regular allocation calls while inside the
> allocator itself for example.

I'm aware that spinlocks are preemptible in RT.
Here is my understanding of why the above is correct...
- preemptible() means that IRQs are not disabled and preempt_count == 0.

- All page alloc operations are protected either by
pcp_spin_trylock() or by spin_lock_irqsave(&zone->lock, flags)
or both together.

- In non-RT spin_lock_irqsave disables IRQs, so preemptible()
check guarantees that we're not holding zone->lock.
The page alloc logic can hold pcp lock when try_alloc_page() is called,
but it's always using pcp_trylock, so it's still ok to call it
with GFP_NOWAIT. pcp trylock will fail and zone->lock will proceed
to acquire zone->lock.

- In RT spin_lock_irqsave doesn't disable IRQs despite its name.
It calls rt_spin_lock() which calls rcu_read_lock()
which increments preempt_count.
So preemptible() checks guards try_alloc_page() from re-entering
in zone->lock protected code.
And pcp_trylock is trylock.

So I believe the above is safe.

> > +     /*
> > +      * Best effort allocation from percpu free list.
> > +      * If it's empty attempt to spin_trylock zone->lock.
> > +      * Do not specify __GFP_KSWAPD_RECLAIM to avoid wakeup_kswapd
> > +      * that may need to grab a lock.
> > +      * Do not specify __GFP_ACCOUNT to avoid local_lock.
> > +      * Do not warn either.
> > +      */
> > +     return alloc_pages_node_noprof(nid, __GFP_TRYLOCK | __GFP_NOWARN | __GFP_ZERO, 0);
> > +}





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux