Re: [PATCH 1/3] bpf: Disable preemption when increasing per-cpu map_locked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, Hou Tao

On Sun, Aug 21, 2022 at 6:28 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 8/22/2022 12:42 AM, Hao Luo wrote:
> > Hi Hou Tao,
> >
> > On Sat, Aug 20, 2022 at 8:14 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> From: Hou Tao <houtao1@xxxxxxxxxx>
> >>
> >> Per-cpu htab->map_locked is used to prohibit the concurrent accesses
> >> from both NMI and non-NMI contexts. But since 74d862b682f51 ("sched:
> >> Make migrate_disable/enable() independent of RT"), migrations_disable()
> >> is also preemptible under !PREEMPT_RT case, so now map_locked also
> >> disallows concurrent updates from normal contexts (e.g. userspace
> >> processes) unexpectedly as shown below:
> >>
> >> process A                      process B
> >>
> >> htab_map_update_elem()
> >>   htab_lock_bucket()
> >>     migrate_disable()
> >>     /* return 1 */
> >>     __this_cpu_inc_return()
> >>     /* preempted by B */
> >>
> >>                                htab_map_update_elem()
> >>                                  /* the same bucket as A */
> >>                                  htab_lock_bucket()
> >>                                    migrate_disable()
> >>                                    /* return 2, so lock fails */
> >>                                    __this_cpu_inc_return()
> >>                                    return -EBUSY
> >>
> >> A fix that seems feasible is using in_nmi() in htab_lock_bucket() and
> >> only checking the value of map_locked for nmi context. But it will
> >> re-introduce dead-lock on bucket lock if htab_lock_bucket() is re-entered
> >> through non-tracing program (e.g. fentry program).
> >>
> >> So fixing it by using disable_preempt() instead of migrate_disable() when
> >> increasing htab->map_locked. However when htab_use_raw_lock() is false,
> >> bucket lock will be a sleepable spin-lock and it breaks disable_preempt(),
> >> so still use migrate_disable() for spin-lock case.
> >>
> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
> >> ---
> > IIUC, this patch enlarges the scope of preemption disable to cover inc
> > map_locked. But I don't think the change is meaningful.
> Before 74d862b682f51 ("sched: Make migrate_disable/enable() independent of
> RT"),  the preemption is disabled before increasing map_locked for !PREEMPT_RT
> case, so I don't think that the change is meaningless.
> >
> > This patch only affects the case when raw lock is used. In the case of
> > raw lock, irq is disabled for b->raw_lock protected critical section.
> > A raw spin lock itself doesn't block in both RT and non-RT. So, my
> > understanding about this patch is, it just makes sure preemption
> > doesn't happen on the exact __this_cpu_inc_return. But the window is
> > so small that it should be really unlikely to happen.
> No, it can be easily reproduced by running multiple htab update processes in the
> same CPU. Will add selftest to demonstrate that.

Can you clarify what you demonstrate?

Here is my theory, but please correct me if I'm wrong, I haven't
tested yet. In non-RT, I doubt preemptions are likely to happen after
migrate_disable. That is because very soon after migrate_disable, we
enter the critical section of b->raw_lock with irq disabled. In RT,
preemptions can happen on acquiring b->lock, that is certainly
possible, but this is the !use_raw_lock path, which isn't side-stepped
by this patch.

> >
> >>  kernel/bpf/hashtab.c | 23 ++++++++++++++++++-----
> >>  1 file changed, 18 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> >> index 6c530a5e560a..ad09da139589 100644
> >> --- a/kernel/bpf/hashtab.c
> >> +++ b/kernel/bpf/hashtab.c
> >> @@ -162,17 +162,25 @@ static inline int htab_lock_bucket(const struct bpf_htab *htab,
> >>                                    unsigned long *pflags)
> >>  {
> >>         unsigned long flags;
> >> +       bool use_raw_lock;
> >>
> >>         hash = hash & HASHTAB_MAP_LOCK_MASK;
> >>
> >> -       migrate_disable();
> >> +       use_raw_lock = htab_use_raw_lock(htab);
> >> +       if (use_raw_lock)
> >> +               preempt_disable();
> >> +       else
> >> +               migrate_disable();
> >>         if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) {
> >>                 __this_cpu_dec(*(htab->map_locked[hash]));
> >> -               migrate_enable();
> >> +               if (use_raw_lock)
> >> +                       preempt_enable();
> >> +               else
> >> +                       migrate_enable();
> >>                 return -EBUSY;
> >>         }
> >>
> >> -       if (htab_use_raw_lock(htab))
> >> +       if (use_raw_lock)
> >>                 raw_spin_lock_irqsave(&b->raw_lock, flags);
> >>         else
> >>                 spin_lock_irqsave(&b->lock, flags);
> >> @@ -185,13 +193,18 @@ static inline void htab_unlock_bucket(const struct bpf_htab *htab,
> >>                                       struct bucket *b, u32 hash,
> >>                                       unsigned long flags)
> >>  {
> >> +       bool use_raw_lock = htab_use_raw_lock(htab);
> >> +
> >>         hash = hash & HASHTAB_MAP_LOCK_MASK;
> >> -       if (htab_use_raw_lock(htab))
> >> +       if (use_raw_lock)
> >>                 raw_spin_unlock_irqrestore(&b->raw_lock, flags);
> >>         else
> >>                 spin_unlock_irqrestore(&b->lock, flags);
> >>         __this_cpu_dec(*(htab->map_locked[hash]));
> >> -       migrate_enable();
> >> +       if (use_raw_lock)
> >> +               preempt_enable();
> >> +       else
> >> +               migrate_enable();
> >>  }
> >>
> >>  static bool htab_lru_map_delete_node(void *arg, struct bpf_lru_node *node);
> >> --
> >> 2.29.2
> >>
> > .
>



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux