Hi Hou Tao, On Sat, Aug 20, 2022 at 8:14 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > From: Hou Tao <houtao1@xxxxxxxxxx> > > Per-cpu htab->map_locked is used to prohibit the concurrent accesses > from both NMI and non-NMI contexts. But since 74d862b682f51 ("sched: > Make migrate_disable/enable() independent of RT"), migrations_disable() > is also preemptible under !PREEMPT_RT case, so now map_locked also > disallows concurrent updates from normal contexts (e.g. userspace > processes) unexpectedly as shown below: > > process A process B > > htab_map_update_elem() > htab_lock_bucket() > migrate_disable() > /* return 1 */ > __this_cpu_inc_return() > /* preempted by B */ > > htab_map_update_elem() > /* the same bucket as A */ > htab_lock_bucket() > migrate_disable() > /* return 2, so lock fails */ > __this_cpu_inc_return() > return -EBUSY > > A fix that seems feasible is using in_nmi() in htab_lock_bucket() and > only checking the value of map_locked for nmi context. But it will > re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered > through non-tracing program (e.g. fentry program). > > So fixing it by using disable_preempt() instead of migrate_disable() when > increasing htab->map_locked. However when htab_use_raw_lock() is false, > bucket lock will be a sleepable spin-lock and it breaks disable_preempt(), > so still use migrate_disable() for spin-lock case. > > Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > --- IIUC, this patch enlarges the scope of preemption disable to cover inc map_locked. But I don't think the change is meaningful. This patch only affects the case when raw lock is used. In the case of raw lock, irq is disabled for b->raw_lock protected critical section. A raw spin lock itself doesn't block in both RT and non-RT. So, my understanding about this patch is, it just makes sure preemption doesn't happen on the exact __this_cpu_inc_return. But the window is so small that it should be really unlikely to happen. > kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > index 6c530a5e560a..ad09da139589 100644 > --- a/kernel/bpf/hashtab.c > +++ b/kernel/bpf/hashtab.c > @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, > unsigned long *pflags) > { > unsigned long flags; > + bool use_raw_lock; > > hash = hash & HASHTAB_MAP_LOCK_MASK; > > - migrate_disable(); > + use_raw_lock = htab_use_raw_lock(htab); > + if (use_raw_lock) > + preempt_disable(); > + else > + migrate_disable(); > if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { > __this_cpu_dec(*(htab->map_locked[hash])); > - migrate_enable(); > + if (use_raw_lock) > + preempt_enable(); > + else > + migrate_enable(); > return -EBUSY; > } > > - if (htab_use_raw_lock(htab)) > + if (use_raw_lock) > raw_spin_lock_irqsave(&b->raw_lock, flags); > else > spin_lock_irqsave(&b->lock, flags); > @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, > struct bucket *b, u32 hash, > unsigned long flags) > { > + bool use_raw_lock = htab_use_raw_lock(htab); > + > hash = hash & HASHTAB_MAP_LOCK_MASK; > - if (htab_use_raw_lock(htab)) > + if (use_raw_lock) > raw_spin_unlock_irqrestore(&b->raw_lock, flags); > else > spin_unlock_irqrestore(&b->lock, flags); > __this_cpu_dec(*(htab->map_locked[hash])); > - migrate_enable(); > + if (use_raw_lock) > + preempt_enable(); > + else > + migrate_enable(); > } > > static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); > -- > 2.29.2 >