Re: [PATCH bpf-next 1/2] Patch to Fix deadlocks in queue and stack maps

Siddharth Chintamaneni <sidchintamaneni@xxxxxxxxx> · Mon, 29 Apr 2024 13:51:10 -0400

On Mon, 29 Apr 2024 at 13:47, Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote:
>
> On Mon, 29 Apr 2024 at 18:57, Siddharth Chintamaneni
> <sidchintamaneni@xxxxxxxxx> wrote:
> >
> > From: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> >
> > This patch address a possible deadlock issue in queue and
> > stack map types.
> >
> > Deadlock could happen when a nested BPF program
> > acquires the same lock as the parent BPF program
> > to perform a write operation on the same map as
> > the first one. This bug is also reported by
> > syzbot.
> >
> > Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/
> > Reported-by: syzbot+8bdfc2c53fb2b63e1871@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
> > Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> > ---
> >  kernel/bpf/queue_stack_maps.c | 42 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 42 insertions(+)
> >
> > diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> > index d869f51ea93a..4b7df1a53cf2 100644
> > --- a/kernel/bpf/queue_stack_maps.c
> > +++ b/kernel/bpf/queue_stack_maps.c
> > @@ -18,6 +18,7 @@ struct bpf_queue_stack {
> >         raw_spinlock_t lock;
> >         u32 head, tail;
> >         u32 size; /* max_entries + 1 */
> > +       int __percpu *map_locked;
> >
> >         char elements[] __aligned(8);
> >  };
> > @@ -78,6 +79,16 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
> >
> >         qs->size = size;
> >
> > +       qs->map_locked = bpf_map_alloc_percpu(&qs->map,
> > +                                               sizeof(int),
> > +                                               sizeof(int),
> > +                                               GFP_USER);
>
> GFP_USER | __GFP_NOWARN, like we do everywhere else.
>
> > +       if (!qs->map_locked) {
> > +               bpf_map_area_free(qs);
> > +               return ERR_PTR(-ENOMEM);
> > +       }
> > +
> > +
> >         raw_spin_lock_init(&qs->lock);
> >
> >         return &qs->map;
> > @@ -98,6 +109,16 @@ static long __queue_map_get(struct bpf_map *map, void *value, bool delete)
> >         int err = 0;
> >         void *ptr;
> >
> > +       preempt_disable();
> > +       local_irq_save(flags);
> > +       if (unlikely(__this_cpu_inc_return(*(qs->map_locked)) != 1)) {
> > +               __this_cpu_dec(*(qs->map_locked));
> > +               local_irq_restore(flags);
> > +               preempt_enable();
> > +               return -EBUSY;
> > +       }
> > +       preempt_enable();
> > +
>
> You increment, but don't decrement the map_locked counter after unlock.
> Likewise in all other cases. Then this operation cannot be called
> anymore after the first time on a given cpu for a given map.
> Probably why CI is also failing.
> https://github.com/kernel-patches/bpf/actions/runs/8882578097/job/24387802831?pr=6915
> returns -16 (EBUSY).
> E.g. check hashtab.c, it does __this_cpu_dec after unlock in htab_unlock_bucket.
>

My bad I sent a wrong patch, I will send a revised version.

> >  [...]
> >
> >