Re: [PATCH bpf-next v2 1/6] mm, bpf: Introduce __GFP_TRYLOCK for opportunistic page allocation

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Thu, 12 Dec 2024 13:57:37 -0800




On Thu, Dec 12, 2024 at 7:35 AM Sebastian Sewior <bigeasy@xxxxxxxxxxxxx> wrote:
>
> On 2024-12-12 16:21:28 [+0100], Michal Hocko wrote:
> > On Thu 12-12-24 16:07:44, Sebastian Sewior wrote:
> > > But since I see in_nmi(). You can't trylock from NMI on RT. The trylock
> > > part is easy but unlock might need to acquire rt_mutex_base::wait_lock
> > > and worst case is to wake a waiter via wake_up_process().
> >
> > Ohh, I didn't realize that. So try_lock would only be safe on
> > raw_spin_lock right?
>
> If NMI is one of the possible calling contexts, yes.

Looks like in_nmi both trylock and unlock are not safe.

pcp_spin_trylock() calls __rt_spin_trylock() on RT,
which can deadlock inside rt_mutex_slowtrylock().
This part has a potential workaround like:

@@ -102,8 +102,11 @@ static __always_inline int
__rt_spin_trylock(spinlock_t *lock)
 {
        int ret = 1;

-       if (unlikely(!rt_mutex_cmpxchg_acquire(&lock->lock, NULL, current)))
+       if (unlikely(!rt_mutex_cmpxchg_acquire(&lock->lock, NULL, current))) {
+               if (in_nmi())
+                       return 0;
                ret = rt_mutex_slowtrylock(&lock->lock);
+       }

but when there are waiters and cmpxchg in this part fails:
        if (unlikely(!rt_mutex_cmpxchg_release(&lock->lock, current, NULL)))
                rt_mutex_slowunlock(&lock->lock);

will trigger slowunlock that is impossible to do from nmi.
We can punt it to irqwork with IRQ_WORK_HARD_IRQ to make sure
it runs as soon as nmi finishes.
Since it's hard irq debug_rt_mutex_unlock(lock); shouldn't complain.
The current will stay the same ?
Other ideas?