Re: [PATCH v10 bpf-next 8/9] bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sun, 19 Feb 2023 16:26:37 -0800

On Thu, Feb 16, 2023 at 02:55:23PM -0800, Joanne Koong wrote:
> +
> +/**
> + * bpf_dynptr_slice_rdwr - Obtain a pointer to the dynptr data.
> + *
> + * For non-skb and non-xdp type dynptrs, there is no difference between
> + * bpf_dynptr_slice and bpf_dynptr_data.
> + *
> + * @ptr: The dynptr whose data slice to retrieve
> + * @offset: Offset into the dynptr
> + * @buffer: User-provided buffer to copy contents into
> + * @buffer__sz: Size (in bytes) of the buffer. This is the length of the
> + * requested slice
> + *
> + * @returns: NULL if the call failed (eg invalid dynptr), pointer to a
> + * data slice (can be either direct pointer to the data or a pointer to the user
> + * provided buffer, with its contents containing the data, if unable to obtain
> + * direct pointer)

The doc probably should say that the returned pointer is writeable and
the user must do if (ptr != buffer) bpf_dynptr_write() to reflect the changes.

Maybe document all kfuncs similar to Documentation/bpf/cpumasks.rst ?

Should we also document that bpf_dynptr_slice[_rdwr] do not change skb
configuration and because of that the ctx->data/data_end pointers are not invalidated
by either skb_header_pointer or bpf_xdp_pointer ?

> + */
> +__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
> +					void *buffer, u32 buffer__sz)
> +{
> +	if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
> +		return 0;
> +
> +	/* bpf_dynptr_slice_rdwr is the same logic as bpf_dynptr_slice.
> +	 *
> +	 * For skb-type dynptrs, the verifier has already ensured that the skb
> +	 * head is writable (see bpf_unclone_prologue()).
> +	 */

This is way too terse. It needs much more detailed comment explaining why it's safe
to write into the returned pointer.
For example it's far from obvious that bpf_dynptr_slice()->skb_header_pointer()
returns a pointer to a head or copies into a buffer. _only_. and no other logic.
Without looking into implementation details one could come up with skb_header_pointer()
behavior that returns a pointer to a middle part of a frag if {offset, len} combination allows.
And in such case it will not be safe to write into such pointer.
Because bpf_unclone_prologue() only makes sure that the head is writeable.
One can look at bpf_unclone_prologue() and see that it's doing bpf_skb_pull_data(skb, 0);
But without looking further it's also not at all obvious that arg2 == 0 means
'make head writeable'.

Also 'For skb-type dynptrs, the verifier has already ensured that the skb head is writable'
is partially true.
skb-type dynptrs are available to cgroup-scoped skb hooks and there bpf_dynptr_slice_rdwr()
will always be failing, since bpf_dynptr_is_rdonly() will be true.
It probably will be better user experience if the verifier rejects
bpf_dynptr_slice_rdwr() in hooks where may_access_direct_pkt_data() returns false.

> +	return bpf_dynptr_slice(ptr, offset, buffer, buffer__sz);
> +}
> +
...
> +			} else if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice] ||
> +				   meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) {
> +				enum bpf_type_flag type_flag = get_dynptr_type_flag(meta.initialized_dynptr.type);
> +
> +				mark_reg_known_zero(env, regs, BPF_REG_0);
> +
> +				if (!tnum_is_const(regs[BPF_REG_4].var_off)) {
> +					verbose(env, "mem_size must be a constant\n");
> +					return -EINVAL;
> +				}
> +				regs[BPF_REG_0].mem_size = regs[BPF_REG_4].var_off.value;
> +
> +				/* PTR_MAYBE_NULL will be added when is_kfunc_ret_null is checked */
> +				regs[BPF_REG_0].type = PTR_TO_MEM | type_flag;
> +
> +				if (meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice])
> +					regs[BPF_REG_0].type |= MEM_RDONLY;
> +				else
> +					env->seen_direct_write = true;

This bit kinda makes it that bpf_dynptr_slice_rdwr() will "fail" in cg-skb hook,
but it will do so with:
        if (ops->gen_prologue || env->seen_direct_write) {
                if (!ops->gen_prologue) {
                        verbose(env, "bpf verifier is misconfigured\n");
                        return -EINVAL;
                }

which will confuse users.