Re: [PATCH v14 bpf-next 00/18] mvneta: introduce XDP multi-buffer support

Jakub Kicinski <kuba@xxxxxxxxxx> · Thu, 16 Sep 2021 09:55:39 -0700

On Fri, 10 Sep 2021 18:14:06 +0200 Lorenzo Bianconi wrote:
> The two following ebpf helpers (and related selftests) has been introduced:
> - bpf_xdp_adjust_data:
>   Move xdp_md->data and xdp_md->data_end pointers in subsequent fragments
>   according to the offset provided by the ebpf program. This helper can be
>   used to read/write values in frame payload.
> - bpf_xdp_get_buff_len:
>   Return the total frame size (linear + paged parts)

> More info about the main idea behind this approach can be found here [1][2].

Is there much critique of the skb helpers we have? My intuition would
be to follow a similar paradigm from the API perspective. It may seem
trivial to us to switch between the two but "normal" users could easily
be confused.

By skb paradigm I mean skb_pull_data() and bpf_skb_load/store_bytes().

Alternatively how about we produce a variation on skb_header_pointer()
(use on-stack buffer or direct access if the entire region is in one
frag).

bpf_xdp_adjust_data() seems to add cost to helpers and TBH I'm not sure
how practical it would be to applications. My understanding is that the
application is not supposed to make assumptions about the fragment
geometry, meaning data can be split at any point. Parsing data
arbitrarily split into buffers is hard if pull() is not an option, let
alone making such parsing provably correct.

Won't applications end up building something like skb_header_pointer()
based on bpf_xdp_adjust_data(), anyway? In which case why don't we
provide them what they need?

say: 

void *xdp_mb_pointer(struct xdp_buff *xdp_md, u32 flags, 
                     u32 offset, u32 len, void *stack_buf)

flags and offset can be squashed into one u64 as needed. Helper returns
pointer to packet data, either real one or stack_buf. Verifier has to
be taught that the return value is NULL or a pointer which is safe with
offsets up to @len.

If the reason for access is write we'd also need:

void *xdp_mb_pointer_flush(struct xdp_buff *xdp_md, u32 flags, 
                           u32 offset, u32 len, void *stack_buf)

Same inputs, if stack buffer was used it does write back, otherwise nop.

Sorry for the longish email if I'm missing something obvious and/or
discussed earlier.

The other thing I wanted to double check - was the decision on program
compatibility made? Is a new program type an option? It'd be extremely
useful operationally to be able to depend on kernel enforcement.