Re: [PATCH bpf-next 1/2] Patch to Fix deadlocks in queue and stack maps

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Mon, 29 Apr 2024 19:46:58 +0200

On Mon, 29 Apr 2024 at 18:57, Siddharth Chintamaneni
<sidchintamaneni@xxxxxxxxx> wrote:
>
> From: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
>
> This patch address a possible deadlock issue in queue and
> stack map types.
>
> Deadlock could happen when a nested BPF program
> acquires the same lock as the parent BPF program
> to perform a write operation on the same map as
> the first one. This bug is also reported by
> syzbot.
>
> Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@xxxxxxxxxx/
> Reported-by: syzbot+8bdfc2c53fb2b63e1871@xxxxxxxxxxxxxxxxxxxxxxxxx
> Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
> Signed-off-by: Siddharth Chintamaneni <sidchintamaneni@xxxxxx>
> ---
>  kernel/bpf/queue_stack_maps.c | 42 +++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
>
> diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> index d869f51ea93a..4b7df1a53cf2 100644
> --- a/kernel/bpf/queue_stack_maps.c
> +++ b/kernel/bpf/queue_stack_maps.c
> @@ -18,6 +18,7 @@ struct bpf_queue_stack {
>         raw_spinlock_t lock;
>         u32 head, tail;
>         u32 size; /* max_entries + 1 */
> +       int __percpu *map_locked;
>
>         char elements[] __aligned(8);
>  };
> @@ -78,6 +79,16 @@ static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr)
>
>         qs->size = size;
>
> +       qs->map_locked = bpf_map_alloc_percpu(&qs->map,
> +                                               sizeof(int),
> +                                               sizeof(int),
> +                                               GFP_USER);

GFP_USER | __GFP_NOWARN, like we do everywhere else.

> +       if (!qs->map_locked) {
> +               bpf_map_area_free(qs);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +
>         raw_spin_lock_init(&qs->lock);
>
>         return &qs->map;
> @@ -98,6 +109,16 @@ static long __queue_map_get(struct bpf_map *map, void *value, bool delete)
>         int err = 0;
>         void *ptr;
>
> +       preempt_disable();
> +       local_irq_save(flags);
> +       if (unlikely(__this_cpu_inc_return(*(qs->map_locked)) != 1)) {
> +               __this_cpu_dec(*(qs->map_locked));
> +               local_irq_restore(flags);
> +               preempt_enable();
> +               return -EBUSY;
> +       }
> +       preempt_enable();
> +

You increment, but don't decrement the map_locked counter after unlock.
Likewise in all other cases. Then this operation cannot be called
anymore after the first time on a given cpu for a given map.
Probably why CI is also failing.
https://github.com/kernel-patches/bpf/actions/runs/8882578097/job/24387802831?pr=6915
returns -16 (EBUSY).
E.g. check hashtab.c, it does __this_cpu_dec after unlock in htab_unlock_bucket.

>  [...]
>
>