On 8/29/2022 6:39 AM, KP Singh wrote: > On Sat, Aug 27, 2022 at 11:43 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> From: Hou Tao <houtao1@xxxxxxxxxx> >> >> Per-cpu htab->map_locked is used to prohibit the concurrent accesses >> from both NMI and non-NMI contexts. But since commit 74d862b682f5 >> ("sched: Make migrate_disable/enable() independent of RT"), >> migrations_disable() is also preemptible under CONFIG_PREEMPT case, > nit: migrate_disable Will fix in v3. > >> so now map_locked also disallows concurrent updates from normal contexts >> (e.g. userspace processes) unexpectedly as shown below: >> >> process A process B >> >> htab_map_update_elem() >> htab_lock_bucket() >> migrate_disable() >> /* return 1 */ >> __this_cpu_inc_return() >> /* preempted by B */ >> >> htab_map_update_elem() >> /* the same bucket as A */ >> htab_lock_bucket() >> migrate_disable() >> /* return 2, so lock fails */ >> __this_cpu_inc_return() >> return -EBUSY >> >> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and >> only checking the value of map_locked for nmi context. But it will >> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered >> through non-tracing program (e.g. fentry program). >> >> So fixing it by using disable_preempt() instead of migrate_disable() when >> increasing htab->map_locked. However when htab_use_raw_lock() is false, >> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(), >> so still use migrate_disable() for spin-lock case and leave the >> concurrent map updates problem to BPF memory allocator patchset in which >> !htab_use_raw_lock() case will be removed. > I think the description needs a bit more clarity and I think you mean > preempt_disable() instead of disable_preempt > > Suggestion: > > One cannot use preempt_disable() to fix this issue as htab_use_raw_lock > being false causes the bucket lock to be a spin lock which can sleep and > does not work with preempt_disable(). > > Therefore, use migrate_disable() when using the spinlock instead of > preempt_disable() and defer fixing concurrent updates to when the kernel > has its own BPF memory allocator. Will update the commit message in v3 and thanks for the suggestion. > >> Reviewed-by: Hao Luo <haoluo@xxxxxxxxxx> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > Fixes tag here please? Will add "Fixes: 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")" in v3. > > >> --- >> kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- >> 1 file changed, 18 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index b301a63afa2f..6fb3b7fd1622 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, >> unsigned long *pflags) >> { >> unsigned long flags; >> + bool use_raw_lock; >> >> hash = hash & HASHTAB_MAP_LOCK_MASK; >> >> - migrate_disable(); >> + use_raw_lock = htab_use_raw_lock(htab); >> + if (use_raw_lock) >> + preempt_disable(); >> + else >> + migrate_disable(); >> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { >> __this_cpu_dec(*(htab->map_locked[hash])); >> - migrate_enable(); >> + if (use_raw_lock) >> + preempt_enable(); >> + else >> + migrate_enable(); >> return -EBUSY; >> } >> >> - if (htab_use_raw_lock(htab)) >> + if (use_raw_lock) >> raw_spin_lock_irqsave(&b->raw_lock, flags); >> else >> spin_lock_irqsave(&b->lock, flags); >> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, >> struct bucket *b, u32 hash, >> unsigned long flags) >> { >> + bool use_raw_lock = htab_use_raw_lock(htab); >> + >> hash = hash & HASHTAB_MAP_LOCK_MASK; >> - if (htab_use_raw_lock(htab)) >> + if (use_raw_lock) >> raw_spin_unlock_irqrestore(&b->raw_lock, flags); >> else >> spin_unlock_irqrestore(&b->lock, flags); >> __this_cpu_dec(*(htab->map_locked[hash])); >> - migrate_enable(); >> + if (use_raw_lock) >> + preempt_enable(); >> + else >> + migrate_enable(); >> } >> >> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); >> -- >> 2.29.2 >> > .