Re: [PATCH v6 bpf-next 0/8] mvneta: introduce XDP multi-buffer support

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Sat, 23 Jan 2021 02:03:17 +0100

Hi Lorenzo,

On 1/19/21 9:20 PM, Lorenzo Bianconi wrote:
This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

For now, to keep the design simple and to maintain performance, the XDP
BPF-prog (still) only have access to the first-buffer. It is left for
later (another patchset) to add payload access across multiple buffers.

I think xmas break has mostly wiped my memory from 2020 ;) so it would be
good to describe the sketched out design for how this will look like inside
the cover letter in terms of planned uapi exposure. (Additionally discussing
api design proposal could also be sth for BPF office hour to move things
quicker + posting a summary to the list for transparency of course .. just
a thought.)

Glancing over the series, while you've addressed the bpf_xdp_adjust_tail()
helper API, this series will be breaking one assumption of programs at least
for the mvneta driver from one kernel to another if you then use the multi
buff mode, and that is basically bpf_xdp_event_output() API: the assumption
is that you can do full packet capture by passing in the xdp buff len that
is data_end - data ptr. We use it this way for sampling & others might as well
(e.g. xdpcap). But bpf_xdp_copy() would only copy the first buffer today which
would break the full pkt visibility assumption. Just walking the frags if
xdp->mb bit is set would still need some sort of struct xdp_md exposure so
the prog can figure out the actual full size..

This patchset should still allow for these future extensions. The goal
is to lift the XDP MTU restriction that comes with XDP, but maintain
same performance as before.

The main idea for the new multi-buffer layout is to reuse the same
layout used for non-linear SKB. We introduced a "xdp_shared_info" data
structure at the end of the first buffer to link together subsequent buffers.
xdp_shared_info will alias skb_shared_info allowing to keep most of the frags
in the same cache-line (while with skb_shared_info only the first fragment will
be placed in the first "shared_info" cache-line). Moreover we introduced some
xdp_shared_info helpers aligned to skb_frag* ones.
Converting xdp_frame to SKB and deliver it to the network stack is shown in
cpumap code (patch 7/8). Building the SKB, the xdp_shared_info structure
will be converted in a skb_shared_info one.

A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} structure
to notify the bpf/network layer if this is a xdp multi-buffer frame (mb = 1)
or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the xdp_shared_info structure at the end
of the first buffer will be initialized only if mb is set.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO

bpf_xdp_adjust_tail helper has been modified to take info account xdp
multi-buff frames.

Also in terms of logistics (I think mentioned earlier already), for the series to
be merged - as with other networking features spanning core + driver (example
af_xdp) - we also need a second driver (ideally mlx5, i40e or ice) implementing
this and ideally be submitted together in the same series for review. For that
it probably also makes sense to more cleanly split out the core pieces from the
driver ones. Either way, how is progress on that side coming along?

Thanks,
Daniel