Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Thu, 21 Nov 2024 23:06:23 +0100

On Thu, 21 Nov 2024 at 21:21, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> > Teach the verifier about IRQ-disabled sections through the introduction
> > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> > them, and bpf_local_irq_restore, to restore IRQ state and enable them
> > back again.
> >
> > For the purposes of tracking the saved IRQ state, the verifier is taught
> > about a new special object on the stack of type STACK_IRQ_FLAG. This is
> > a 8 byte value which saves the IRQ flags which are to be passed back to
> > the IRQ restore kfunc.
> >
> > To track a dynamic number of IRQ-disabled regions and their associated
> > saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> > state management functions: acquire_irq_state and release_irq_state,
> > taking advantage of the refactoring and clean ups made in earlier
> > commits.
> >
> > One notable requirement of the kernel's IRQ save and restore API is that
> > they cannot happen out of order. For this purpose, resource state is
> > extended with a new type-specific member 'prev_id'. This is used to
> > remember the ordering of acquisitions of IRQ saved states, so that we
> > maintain a logical stack in acquisition order of resource identities,
> > and can enforce LIFO ordering when restoring IRQ state. The top of the
> > stack is maintained using bpf_func_state's active_irq_id.
> >
> > The logic to detect initialized and unitialized irq flag slots, marking
> > and unmarking is similar to how it's done for iterators. We do need to
> > update ressafe to perform check_ids based satisfiability check, and
> > additionally match prev_id for RES_TYPE_IRQ entries in the resource
> > array.
> >
> > The kfuncs themselves are plain wrappers over local_irq_save and
> > local_irq_restore macros.
> >
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> > ---
>
> I think this matches what is done for iterators and dynptrs.
>
> Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
>
> [...]
>
> > @@ -263,10 +267,16 @@ struct bpf_resource_state {
> >        * is used purely to inform the user of a resource leak.
> >        */
> >       int insn_idx;
> > -     /* Use to keep track of the source object of a lock, to ensure
> > -      * it matches on unlock.
> > -      */
> > -     void *ptr;
> > +     union {
> > +             /* Use to keep track of the source object of a lock, to ensure
> > +              * it matches on unlock.
> > +              */
> > +             void *ptr;
> > +             /* Track the reference id preceding the IRQ entry in acquisition
> > +              * order, to enforce an ordering on the release.
> > +              */
> > +             int prev_id;
> > +     };
>
> Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
>       If we do, an alternative to prev_id would be to organize bpf_func_state->res as
>       a stack (by changing erase_resource_state() implementation).

I don't think so, this was the weird case requiring such an ordering,
so I tried to find the least intrusive way.

>
> [...]
>
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 751c150f9e1c..302f0d5976be 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
> >       return ret + 1;
> >  }
> >
> > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> > + * always points to an 8-byte memory region in BPF stack.
> > + */
> > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)
>
> Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
>      What would go wrong if 'u64' is used here?

It goes like this:
If I make this unsigned long long * or u64 *, the kfunc emitted to
vmlinux.h expects a pointer of that type.
Typically, kernel code is always passing unsigned long flags to these
functions, and that's what people are used to.
Given for --target=bpf unsigned long * is always a 8-byte value, I
just did this, so that in kernels that are 32-bit,
we don't end up relying on unsigned long still being 8 when
fetching/storing flags on BPF stack.

>
> > +{
> > +     u64 *ptr = (u64 *)flags__irq_flag;
> > +     unsigned long flags;
> > +
> > +     local_irq_save(flags);
> > +     *ptr = flags;
> > +}
>
> [...]
>
> > @@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state,
> >       for (i = 0; i < state->acquired_res; i++) {
> >               struct bpf_resource_state *s = &state->res[i];
> >
> > -             if (s->type == RES_TYPE_PTR || s->type != type)
> > +             if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type)
>
> Nit: I think this would be easier to read if there was a bitmask
>      associated with lock types.

Ack, will fix.

>
> >                       continue;
> >
> >               if (s->id == id && s->ptr == ptr)
>
> [...]
>