On 2024-12-10 14:06:32 [-0800], Alexei Starovoitov wrote: > > Is there any reason why GFP_ATOMIC cannot be extended to support new > > contexts? This allocation mode is already documented to be usable from > > atomic contexts except from NMI and raw_spinlocks. But is it feasible to > > extend the current implementation to use only trylock on zone->lock if > > called from in_nmi() to reduce unexpected failures on contention for > > existing users? > > No. in_nmi() doesn't help. It's the lack of reentrance of slab and page > allocator that is an issue. > The page alloctor might grab zone lock. In !RT it will disable irqs. > In RT will stay sleepable. Both paths will be calling other > kernel code including tracepoints, potential kprobes, etc > and bpf prog may be attached somewhere. > If it calls alloc_page() it may deadlock on zone->lock. > pcpu lock is thankfully trylock already. > So !irqs_disabled() part of preemptible() guarantees that > zone->lock won't deadlock in !RT. > And rcu_preempt_depth() case just steers bpf into try lock only path in RT. > Since there is no way to tell whether it's safe to call > sleepable spin_lock(&zone->lock). Oh. You don't need to check rcu_preempt_depth() for that. On PREEMPT_RT rcu_preempt_depth() is incremented with every spin_lock() because we need an explicit start of a RCU section (same thing happens with preempt_disable() spin_lock()). If there is already a RCU section (rcu_preempt_depth() > 0) you can still try to acquire a spinlock_t and maybe schedule out/ sleep. That is okay. But since I see in_nmi(). You can't trylock from NMI on RT. The trylock part is easy but unlock might need to acquire rt_mutex_base::wait_lock and worst case is to wake a waiter via wake_up_process(). Sebastian