Hi, Hou Tao On Sun, Aug 21, 2022 at 6:28 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > > Hi, > > On 8/22/2022 12:42 AM, Hao Luo wrote: > > Hi Hou Tao, > > > > On Sat, Aug 20, 2022 at 8:14 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: > >> From: Hou Tao <houtao1@xxxxxxxxxx> > >> > >> Per-cpu htab->map_locked is used to prohibit the concurrent accesses > >> from both NMI and non-NMI contexts. But since 74d862b682f51 ("sched: > >> Make migrate_disable/enable() independent of RT"), migrations_disable() > >> is also preemptible under !PREEMPT_RT case, so now map_locked also > >> disallows concurrent updates from normal contexts (e.g. userspace > >> processes) unexpectedly as shown below: > >> > >> process A process B > >> > >> htab_map_update_elem() > >> htab_lock_bucket() > >> migrate_disable() > >> /* return 1 */ > >> __this_cpu_inc_return() > >> /* preempted by B */ > >> > >> htab_map_update_elem() > >> /* the same bucket as A */ > >> htab_lock_bucket() > >> migrate_disable() > >> /* return 2, so lock fails */ > >> __this_cpu_inc_return() > >> return -EBUSY > >> > >> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and > >> only checking the value of map_locked for nmi context. But it will > >> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered > >> through non-tracing program (e.g. fentry program). > >> > >> So fixing it by using disable_preempt() instead of migrate_disable() when > >> increasing htab->map_locked. However when htab_use_raw_lock() is false, > >> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(), > >> so still use migrate_disable() for spin-lock case. > >> > >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> > >> --- > > IIUC, this patch enlarges the scope of preemption disable to cover inc > > map_locked. But I don't think the change is meaningful. > Before 74d862b682f51 ("sched: Make migrate_disable/enable() independent of > RT"), the preemption is disabled before increasing map_locked for !PREEMPT_RT > case, so I don't think that the change is meaningless. > > > > This patch only affects the case when raw lock is used. In the case of > > raw lock, irq is disabled for b->raw_lock protected critical section. > > A raw spin lock itself doesn't block in both RT and non-RT. So, my > > understanding about this patch is, it just makes sure preemption > > doesn't happen on the exact __this_cpu_inc_return. But the window is > > so small that it should be really unlikely to happen. > No, it can be easily reproduced by running multiple htab update processes in the > same CPU. Will add selftest to demonstrate that. Can you clarify what you demonstrate? Here is my theory, but please correct me if I'm wrong, I haven't tested yet. In non-RT, I doubt preemptions are likely to happen after migrate_disable. That is because very soon after migrate_disable, we enter the critical section of b->raw_lock with irq disabled. In RT, preemptions can happen on acquiring b->lock, that is certainly possible, but this is the !use_raw_lock path, which isn't side-stepped by this patch. > > > >> kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- > >> 1 file changed, 18 insertions(+), 5 deletions(-) > >> > >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c > >> index 6c530a5e560a..ad09da139589 100644 > >> --- a/kernel/bpf/hashtab.c > >> +++ b/kernel/bpf/hashtab.c > >> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab, > >> unsigned long *pflags) > >> { > >> unsigned long flags; > >> + bool use_raw_lock; > >> > >> hash = hash & HASHTAB_MAP_LOCK_MASK; > >> > >> - migrate_disable(); > >> + use_raw_lock = htab_use_raw_lock(htab); > >> + if (use_raw_lock) > >> + preempt_disable(); > >> + else > >> + migrate_disable(); > >> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { > >> __this_cpu_dec(*(htab->map_locked[hash])); > >> - migrate_enable(); > >> + if (use_raw_lock) > >> + preempt_enable(); > >> + else > >> + migrate_enable(); > >> return -EBUSY; > >> } > >> > >> - if (htab_use_raw_lock(htab)) > >> + if (use_raw_lock) > >> raw_spin_lock_irqsave(&b->raw_lock, flags); > >> else > >> spin_lock_irqsave(&b->lock, flags); > >> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab, > >> struct bucket *b, u32 hash, > >> unsigned long flags) > >> { > >> + bool use_raw_lock = htab_use_raw_lock(htab); > >> + > >> hash = hash & HASHTAB_MAP_LOCK_MASK; > >> - if (htab_use_raw_lock(htab)) > >> + if (use_raw_lock) > >> raw_spin_unlock_irqrestore(&b->raw_lock, flags); > >> else > >> spin_unlock_irqrestore(&b->lock, flags); > >> __this_cpu_dec(*(htab->map_locked[hash])); > >> - migrate_enable(); > >> + if (use_raw_lock) > >> + preempt_enable(); > >> + else > >> + migrate_enable(); > >> } > >> > >> static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node); > >> -- > >> 2.29.2 > >> > > . >