On Thu, 21 Nov 2024 at 21:21, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote: > > Teach the verifier about IRQ-disabled sections through the introduction > > of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable > > them, and bpf_local_irq_restore, to restore IRQ state and enable them > > back again. > > > > For the purposes of tracking the saved IRQ state, the verifier is taught > > about a new special object on the stack of type STACK_IRQ_FLAG. This is > > a 8 byte value which saves the IRQ flags which are to be passed back to > > the IRQ restore kfunc. > > > > To track a dynamic number of IRQ-disabled regions and their associated > > saved states, a new resource type RES_TYPE_IRQ is introduced, which its > > state management functions: acquire_irq_state and release_irq_state, > > taking advantage of the refactoring and clean ups made in earlier > > commits. > > > > One notable requirement of the kernel's IRQ save and restore API is that > > they cannot happen out of order. For this purpose, resource state is > > extended with a new type-specific member 'prev_id'. This is used to > > remember the ordering of acquisitions of IRQ saved states, so that we > > maintain a logical stack in acquisition order of resource identities, > > and can enforce LIFO ordering when restoring IRQ state. The top of the > > stack is maintained using bpf_func_state's active_irq_id. > > > > The logic to detect initialized and unitialized irq flag slots, marking > > and unmarking is similar to how it's done for iterators. We do need to > > update ressafe to perform check_ids based satisfiability check, and > > additionally match prev_id for RES_TYPE_IRQ entries in the resource > > array. > > > > The kfuncs themselves are plain wrappers over local_irq_save and > > local_irq_restore macros. > > > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > > --- > > I think this matches what is done for iterators and dynptrs. > > Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx> > > [...] > > > @@ -263,10 +267,16 @@ struct bpf_resource_state { > > * is used purely to inform the user of a resource leak. > > */ > > int insn_idx; > > - /* Use to keep track of the source object of a lock, to ensure > > - * it matches on unlock. > > - */ > > - void *ptr; > > + union { > > + /* Use to keep track of the source object of a lock, to ensure > > + * it matches on unlock. > > + */ > > + void *ptr; > > + /* Track the reference id preceding the IRQ entry in acquisition > > + * order, to enforce an ordering on the release. > > + */ > > + int prev_id; > > + }; > > Nit: Do we anticipate any other resource kinds that would need LIFO acquire/release? > If we do, an alternative to prev_id would be to organize bpf_func_state->res as > a stack (by changing erase_resource_state() implementation). I don't think so, this was the weird case requiring such an ordering, so I tried to find the least intrusive way. > > [...] > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > > index 751c150f9e1c..302f0d5976be 100644 > > --- a/kernel/bpf/helpers.c > > +++ b/kernel/bpf/helpers.c > > @@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user > > return ret + 1; > > } > > > > +/* Keep unsinged long in prototype so that kfunc is usable when emitted to > > + * vmlinux.h in BPF programs directly, but since unsigned long may potentially > > + * be 4 byte, always cast to u64 when reading/writing from this pointer as it > > + * always points to an 8-byte memory region in BPF stack. > > + */ > > +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag) > > Nit: 'unsigned long long' is guaranteed to be at-least 64 bit. > What would go wrong if 'u64' is used here? It goes like this: If I make this unsigned long long * or u64 *, the kfunc emitted to vmlinux.h expects a pointer of that type. Typically, kernel code is always passing unsigned long flags to these functions, and that's what people are used to. Given for --target=bpf unsigned long * is always a 8-byte value, I just did this, so that in kernels that are 32-bit, we don't end up relying on unsigned long still being 8 when fetching/storing flags on BPF stack. > > > +{ > > + u64 *ptr = (u64 *)flags__irq_flag; > > + unsigned long flags; > > + > > + local_irq_save(flags); > > + *ptr = flags; > > +} > > [...] > > > @@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state, > > for (i = 0; i < state->acquired_res; i++) { > > struct bpf_resource_state *s = &state->res[i]; > > > > - if (s->type == RES_TYPE_PTR || s->type != type) > > + if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type) > > Nit: I think this would be easier to read if there was a bitmask > associated with lock types. Ack, will fix. > > > continue; > > > > if (s->id == id && s->ptr == ptr) > > [...] >