Re: [PATCH bpf-next v1 5/7] bpf: Introduce support for bpf_local_irq_{save,restore}

Eduard Zingerman <eddyz87@xxxxxxxxx> · Thu, 21 Nov 2024 12:21:02 -0800

On Wed, 2024-11-20 at 16:53 -0800, Kumar Kartikeya Dwivedi wrote:
> Teach the verifier about IRQ-disabled sections through the introduction
> of two new kfuncs, bpf_local_irq_save, to save IRQ state and disable
> them, and bpf_local_irq_restore, to restore IRQ state and enable them
> back again.
> 
> For the purposes of tracking the saved IRQ state, the verifier is taught
> about a new special object on the stack of type STACK_IRQ_FLAG. This is
> a 8 byte value which saves the IRQ flags which are to be passed back to
> the IRQ restore kfunc.
> 
> To track a dynamic number of IRQ-disabled regions and their associated
> saved states, a new resource type RES_TYPE_IRQ is introduced, which its
> state management functions: acquire_irq_state and release_irq_state,
> taking advantage of the refactoring and clean ups made in earlier
> commits.
> 
> One notable requirement of the kernel's IRQ save and restore API is that
> they cannot happen out of order. For this purpose, resource state is
> extended with a new type-specific member 'prev_id'. This is used to
> remember the ordering of acquisitions of IRQ saved states, so that we
> maintain a logical stack in acquisition order of resource identities,
> and can enforce LIFO ordering when restoring IRQ state. The top of the
> stack is maintained using bpf_func_state's active_irq_id.
> 
> The logic to detect initialized and unitialized irq flag slots, marking
> and unmarking is similar to how it's done for iterators. We do need to
> update ressafe to perform check_ids based satisfiability check, and
> additionally match prev_id for RES_TYPE_IRQ entries in the resource
> array.
> 
> The kfuncs themselves are plain wrappers over local_irq_save and
> local_irq_restore macros.
> 
> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> ---

I think this matches what is done for iterators and dynptrs.

Acked-by: Eduard Zingerman <eddyz87@xxxxxxxxx>

[...]

> @@ -263,10 +267,16 @@ struct bpf_resource_state {
>  	 * is used purely to inform the user of a resource leak.
>  	 */
>  	int insn_idx;
> -	/* Use to keep track of the source object of a lock, to ensure
> -	 * it matches on unlock.
> -	 */
> -	void *ptr;
> +	union {
> +		/* Use to keep track of the source object of a lock, to ensure
> +		 * it matches on unlock.
> +		 */
> +		void *ptr;
> +		/* Track the reference id preceding the IRQ entry in acquisition
> +		 * order, to enforce an ordering on the release.
> +		 */
> +		int prev_id;
> +	};

Nit:  Do we anticipate any other resource kinds that would need LIFO acquire/release?
      If we do, an alternative to prev_id would be to organize bpf_func_state->res as
      a stack (by changing erase_resource_state() implementation).

[...]

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 751c150f9e1c..302f0d5976be 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3057,6 +3057,28 @@ __bpf_kfunc int bpf_copy_from_user_str(void *dst, u32 dst__sz, const void __user
>  	return ret + 1;
>  }
>  
> +/* Keep unsinged long in prototype so that kfunc is usable when emitted to
> + * vmlinux.h in BPF programs directly, but since unsigned long may potentially
> + * be 4 byte, always cast to u64 when reading/writing from this pointer as it
> + * always points to an 8-byte memory region in BPF stack.
> + */
> +__bpf_kfunc void bpf_local_irq_save(unsigned long *flags__irq_flag)

Nit: 'unsigned long long' is guaranteed to be at-least 64 bit.
     What would go wrong if 'u64' is used here?

> +{
> +	u64 *ptr = (u64 *)flags__irq_flag;
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	*ptr = flags;
> +}

[...]

> @@ -1447,7 +1607,7 @@ static struct bpf_resource_state *find_lock_state(struct bpf_func_state *state,
>  	for (i = 0; i < state->acquired_res; i++) {
>  		struct bpf_resource_state *s = &state->res[i];
>  
> -		if (s->type == RES_TYPE_PTR || s->type != type)
> +		if (s->type < __RES_TYPE_LOCK_BEGIN || s->type != type)

Nit: I think this would be easier to read if there was a bitmask
     associated with lock types.

>  			continue;
>  
>  		if (s->id == id && s->ptr == ptr)

[...]