Re: [PATCH bpf-next v2 24/26] bpf: Implement verifier support for rqspinlock

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Thu, 13 Feb 2025 07:41:53 +0100

On Wed, 12 Feb 2025 at 01:08, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Thu, 2025-02-06 at 02:54 -0800, Kumar Kartikeya Dwivedi wrote:
> > Introduce verifier-side support for rqspinlock kfuncs. The first step is
> > allowing bpf_res_spin_lock type to be defined in map values and
> > allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK
> > field to recognize and validate.
> >
> > Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only
> > one of them (and at most one of them per-object, like before) must be
> > present. The bpf_res_spin_lock can also be used to protect objects that
> > require lock protection for their kfuncs, like BPF rbtree and linked
> > list.
> >
> > The verifier plumbing to simulate success and failure cases when calling
> > the kfuncs is done by pushing a new verifier state to the verifier state
> > stack which will verify the failure case upon calling the kfunc. The
> > path where success is indicated creates all lock reference state and IRQ
> > state (if necessary for irqsave variants). In the case of failure, the
> > state clears the registers r0-r5, sets the return value, and skips kfunc
> > processing, proceeding to the next instruction.
> >
> > When marking the return value for success case, the value is marked as
> > 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program,
> > whenever user checks the return value as 'if (ret)' or 'if (ret < 0)'
> > the verifier never traverses such branches for success cases, and would
> > be aware that the lock is not held in such cases.
> >
> > We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs
> > are invoked. We introduce a kfunc_class state to avoid mixing lock
> > irqrestore kfuncs with IRQ state created by bpf_local_irq_save.
> >
> > With all this infrastructure, these kfuncs become usable in programs
> > while satisfying all safety properties required by the kernel.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> > ---
>
> Apart from a few nits, I think this patch looks good.
>
> Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
>

Thanks!

> [...]
>
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 32c23f2a3086..ed444e44f524 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -115,6 +115,15 @@ struct bpf_reg_state {
> >                       int depth:30;
> >               } iter;
> >
> > +             /* For irq stack slots */
> > +             struct {
> > +                     enum {
> > +                             IRQ_KFUNC_IGNORE,
>
> Is this state ever used?
> mark_stack_slot_irq_flag() is always called with either NATIVE or LOCK.

Hm, no, it was just the default / invalid value, I guess it can be dropped.

>
> > +                             IRQ_NATIVE_KFUNC,
> > +                             IRQ_LOCK_KFUNC,
> > +                     } kfunc_class;
> > +             } irq;
> > +
> >               /* Max size from any of the above. */
> >               struct {
> >                       unsigned long raw1;
>
> [...]
>
> > @@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno,
> >       }
> >
> >       rec = reg_btf_record(reg);
> > -     if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) {
> > -             verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local",
> > -                     map ? map->name : "kptr");
> > +     if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) {
> > +             verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local",
> > +                     map ? map->name : "kptr", lock_str);
> >               return -EINVAL;
> >       }
> > -     if (rec->spin_lock_off != val + reg->off) {
> > -             verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n",
> > -                     val + reg->off, rec->spin_lock_off);
> > +     spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off;
> > +     if (spin_lock_off != val + reg->off) {
> > +             verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n",
> > +                     val + reg->off, lock_str, spin_lock_off);
> >               return -EINVAL;
> >       }
> >       if (is_lock) {
> >               void *ptr;
> > +             int type;
> >
> >               if (map)
> >                       ptr = map;
> >               else
> >                       ptr = btf;
> >
> > -             if (cur->active_locks) {
> > -                     verbose(env,
> > -                             "Locking two bpf_spin_locks are not allowed\n");
> > -                     return -EINVAL;
> > +             if (!is_res_lock && cur->active_locks) {
>
> Nit: having '&& cur->active_locks' in this branch but not the one for
>      'is_res_lock' is a bit confusing. As far as I understand this is
>      just an optimization, and active_locks check could be done (or dropped)
>      in both cases.

Yeah, I can make it consistent by adding the check to both.

>
> > +                     if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) {
> > +                             verbose(env,
> > +                                     "Locking two bpf_spin_locks are not allowed\n");
> > +                             return -EINVAL;
> > +                     }
> > +             } else if (is_res_lock) {
> > +                     if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) {
> > +                             verbose(env, "Acquiring the same lock again, AA deadlock detected\n");
> > +                             return -EINVAL;
> > +                     }
> >               }
>
> Nit: there is no branch for find_lock_state(... REF_TYPE_RES_LOCK_IRQ ...),
>      this is not a problem, as other checks catch the imbalance in
>      number of unlocks or unlock of the same lock, but verifier won't
>      report the above "AA deadlock" message for bpf_res_spin_lock_irqsave().
>

Good point, will fix.

> The above two checks make it legal to take resilient lock while
> holding regular lock and vice versa. This is probably ok, can't figure
> out an example when this causes trouble.

Yeah, that shouldn't cause a problem.

>
> > -             err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr);
> > +
> > +             if (is_res_lock && is_irq)
> > +                     type = REF_TYPE_RES_LOCK_IRQ;
> > +             else if (is_res_lock)
> > +                     type = REF_TYPE_RES_LOCK;
> > +             else
> > +                     type = REF_TYPE_LOCK;
> > +             err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr);
> >               if (err < 0) {
> >                       verbose(env, "Failed to acquire lock state\n");
> >                       return err;
> >               }
> >       } else {
> >               void *ptr;
> > +             int type;
> >
> >               if (map)
> >                       ptr = map;
>
> [...]
>