Re: [PATCH bpf-next v3 4/7] bpf: Introduce support for bpf_local_irq_{save,restore}

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Thu, 28 Nov 2024 05:39:07 +0100

On Thu, 28 Nov 2024 at 05:31, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-11-27 at 08:58 -0800, Kumar Kartikeya Dwivedi wrote:
> > Teach the verifier about IRQ-disabled sections through the introduction
> > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > back again.
> >
> > For the purposes of tracking the saved IRQ state, the verifier is taught
> > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > a 8 byte value which saves the IRQ flags which are to be passed back to
> > the IRQ restore kfunc.
> >
> > Renumber the enums for REF_TYPE_* to simplify the check in
> > find_lock_state, filtering out non-lock types as they grow will become
> > cumbersome and is unecessary.
> >
> > To track a dynamic number of IRQ-disabled regions and their associated
> > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > state management functions: acquire_irq_state and release_irq_state,
> > taking advantage of the refactoring and clean ups made in earlier
> > commits.
> >
> > One notable requirement of the kernel's IRQ save and restore API is that
> > they cannot happen out of order. For this purpose, when releasing reference
> > we keep track of the prev_id we saw with REF_TYPE_IRQ. Since reference
> > states are inserted in increasing order of the index, this is used to
> > remember the ordering of acquisitions of IRQ saved states, so that we
> > maintain a logical stack in acquisition order of resource identities,
> > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > stack is maintained using bpf_verifier_state's active_irq_id.
> >
> > The logic to detect initialized and unitialized irq flag slots, marking
> > and unmarking is similar to how it's done for iterators. No additional
> > checks are needed in refsafe for REF_TYPE_IRQ, apart from the usual
> > check_id satisfiability check on the ref[i].id. We have to perform the
> > same check_ids check on state->active_irq_id as well.
> >
> > The kfuncs themselves are plain wrappers over local_irq_save and
> > local_irq_restore macros.
> >
> > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
>
> Sorry, two more nits below.
>
> [...]
>
> > +static int unmark_stack_slot_irq_flag(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     struct bpf_stack_state *slot;
> > +     struct bpf_reg_state *st;
> > +     int spi, i, err;
> > +
> > +     spi = irq_flag_get_spi(env, reg);
> > +     if (spi < 0)
> > +             return spi;
> > +
> > +     slot = &state->stack[spi];
> > +     st = &slot->spilled_ptr;
> > +
> > +     err = release_irq_state(env->cur_state, st->ref_obj_id);
> > +     WARN_ON_ONCE(err && err != -EACCES);
> > +     if (err) {
> > +             verbose(env, "cannot restore irq state out of order\n");
>
> Nit: maybe also print acquire_irq_id and an instruction where it was acquired?

Ack. For printing the insn_idx, I guess just search in the refs array?

>
> > +             return err;
> > +     }
> > +
> > +     __mark_reg_not_init(env, st);
> > +
> > +     /* see unmark_stack_slots_dynptr() for why we need to set REG_LIVE_WRITTEN */
> > +     st->live |= REG_LIVE_WRITTEN;
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++)
> > +             slot->slot_type[i] = STACK_INVALID;
> > +
> > +     mark_stack_slot_scratched(env, spi);
> > +     return 0;
> > +}
> > +
> > +static bool is_irq_flag_reg_valid_uninit(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
> > +{
> > +     struct bpf_func_state *state = func(env, reg);
> > +     struct bpf_stack_state *slot;
> > +     int spi, i;
> > +
> > +     /* For -ERANGE (i.e. spi not falling into allocated stack slots), we
> > +      * will do check_mem_access to check and update stack bounds later, so
> > +      * return true for that case.
> > +      */
> > +     spi = irq_flag_get_spi(env, reg);
> > +     if (spi == -ERANGE)
> > +             return true;
>
> Nit: is it possible to swap is_irq_flag_reg_valid_uninit() and
>      check_mem_access(), so that ERANGE special case would be not needed?
>

I don't think so. For dynptr, iter, irq, ERANGE indicates stack needs
to be grown, so check_mem_access will naturally do that when writing.
When not ERANGE, we need to catch cases where we have a bad slot_type.
If we overwrote it with check_mem_access, then it would scrub the slot
type as well.

When I fixed this stuff for dynptr, we had to additionally
destroy_if_dynptr_stack_slot because it wasn't required to 'release' a
dynptr when overwriting it.
Andrii made sure this was necessary for iters so now slot_type ==
STACK_ITER is just rejected instead of overwrite without a destroy
operation.
Similar idea is followed for irq flag.

Just paging in context for all this, but I may be missing if you have
something in mind.

> > +     if (spi < 0)
> > +             return false;
> > +
> > +     slot = &state->stack[spi];
> > +
> > +     for (i = 0; i < BPF_REG_SIZE; i++)
> > +             if (slot->slot_type[i] == STACK_IRQ_FLAG)
> > +                     return false;
> > +     return true;
> > +}
>
> [...]
>