On Wed, 12 Feb 2025 at 01:08, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Thu, 2025-02-06 at 02:54 -0800, Kumar Kartikeya Dwivedi wrote: > > Introduce verifier-side support for rqspinlock kfuncs. The first step is > > allowing bpf_res_spin_lock type to be defined in map values and > > allocated objects, so BTF-side is updated with a new BPF_RES_SPIN_LOCK > > field to recognize and validate. > > > > Any object cannot have both bpf_spin_lock and bpf_res_spin_lock, only > > one of them (and at most one of them per-object, like before) must be > > present. The bpf_res_spin_lock can also be used to protect objects that > > require lock protection for their kfuncs, like BPF rbtree and linked > > list. > > > > The verifier plumbing to simulate success and failure cases when calling > > the kfuncs is done by pushing a new verifier state to the verifier state > > stack which will verify the failure case upon calling the kfunc. The > > path where success is indicated creates all lock reference state and IRQ > > state (if necessary for irqsave variants). In the case of failure, the > > state clears the registers r0-r5, sets the return value, and skips kfunc > > processing, proceeding to the next instruction. > > > > When marking the return value for success case, the value is marked as > > 0, and for the failure case as [-MAX_ERRNO, -1]. Then, in the program, > > whenever user checks the return value as 'if (ret)' or 'if (ret < 0)' > > the verifier never traverses such branches for success cases, and would > > be aware that the lock is not held in such cases. > > > > We push the kfunc state in check_kfunc_call whenever rqspinlock kfuncs > > are invoked. We introduce a kfunc_class state to avoid mixing lock > > irqrestore kfuncs with IRQ state created by bpf_local_irq_save. > > > > With all this infrastructure, these kfuncs become usable in programs > > while satisfying all safety properties required by the kernel. > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > > --- > > Apart from a few nits, I think this patch looks good. > > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > Thanks! > [...] > > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > > index 32c23f2a3086..ed444e44f524 100644 > > --- a/include/linux/bpf_verifier.h > > +++ b/include/linux/bpf_verifier.h > > @@ -115,6 +115,15 @@ struct bpf_reg_state { > > int depth:30; > > } iter; > > > > + /* For irq stack slots */ > > + struct { > > + enum { > > + IRQ_KFUNC_IGNORE, > > Is this state ever used? > mark_stack_slot_irq_flag() is always called with either NATIVE or LOCK. Hm, no, it was just the default / invalid value, I guess it can be dropped. > > > + IRQ_NATIVE_KFUNC, > > + IRQ_LOCK_KFUNC, > > + } kfunc_class; > > + } irq; > > + > > /* Max size from any of the above. */ > > struct { > > unsigned long raw1; > > [...] > > > @@ -8038,36 +8059,53 @@ static int process_spin_lock(struct bpf_verifier_env *env, int regno, > > } > > > > rec = reg_btf_record(reg); > > - if (!btf_record_has_field(rec, BPF_SPIN_LOCK)) { > > - verbose(env, "%s '%s' has no valid bpf_spin_lock\n", map ? "map" : "local", > > - map ? map->name : "kptr"); > > + if (!btf_record_has_field(rec, is_res_lock ? BPF_RES_SPIN_LOCK : BPF_SPIN_LOCK)) { > > + verbose(env, "%s '%s' has no valid %s_lock\n", map ? "map" : "local", > > + map ? map->name : "kptr", lock_str); > > return -EINVAL; > > } > > - if (rec->spin_lock_off != val + reg->off) { > > - verbose(env, "off %lld doesn't point to 'struct bpf_spin_lock' that is at %d\n", > > - val + reg->off, rec->spin_lock_off); > > + spin_lock_off = is_res_lock ? rec->res_spin_lock_off : rec->spin_lock_off; > > + if (spin_lock_off != val + reg->off) { > > + verbose(env, "off %lld doesn't point to 'struct %s_lock' that is at %d\n", > > + val + reg->off, lock_str, spin_lock_off); > > return -EINVAL; > > } > > if (is_lock) { > > void *ptr; > > + int type; > > > > if (map) > > ptr = map; > > else > > ptr = btf; > > > > - if (cur->active_locks) { > > - verbose(env, > > - "Locking two bpf_spin_locks are not allowed\n"); > > - return -EINVAL; > > + if (!is_res_lock && cur->active_locks) { > > Nit: having '&& cur->active_locks' in this branch but not the one for > 'is_res_lock' is a bit confusing. As far as I understand this is > just an optimization, and active_locks check could be done (or dropped) > in both cases. Yeah, I can make it consistent by adding the check to both. > > > + if (find_lock_state(env->cur_state, REF_TYPE_LOCK, 0, NULL)) { > > + verbose(env, > > + "Locking two bpf_spin_locks are not allowed\n"); > > + return -EINVAL; > > + } > > + } else if (is_res_lock) { > > + if (find_lock_state(env->cur_state, REF_TYPE_RES_LOCK, reg->id, ptr)) { > > + verbose(env, "Acquiring the same lock again, AA deadlock detected\n"); > > + return -EINVAL; > > + } > > } > > Nit: there is no branch for find_lock_state(... REF_TYPE_RES_LOCK_IRQ ...), > this is not a problem, as other checks catch the imbalance in > number of unlocks or unlock of the same lock, but verifier won't > report the above "AA deadlock" message for bpf_res_spin_lock_irqsave(). > Good point, will fix. > The above two checks make it legal to take resilient lock while > holding regular lock and vice versa. This is probably ok, can't figure > out an example when this causes trouble. Yeah, that shouldn't cause a problem. > > > - err = acquire_lock_state(env, env->insn_idx, REF_TYPE_LOCK, reg->id, ptr); > > + > > + if (is_res_lock && is_irq) > > + type = REF_TYPE_RES_LOCK_IRQ; > > + else if (is_res_lock) > > + type = REF_TYPE_RES_LOCK; > > + else > > + type = REF_TYPE_LOCK; > > + err = acquire_lock_state(env, env->insn_idx, type, reg->id, ptr); > > if (err < 0) { > > verbose(env, "Failed to acquire lock state\n"); > > return err; > > } > > } else { > > void *ptr; > > + int type; > > > > if (map) > > ptr = map; > > [...] >