On Tue, 2020-04-14 at 14:46 +0200, Jesper Dangaard Brouer wrote: > On Thu, 9 Apr 2020 03:31:14 +0000 > Saeed Mahameed <saeedm@xxxxxxxxxxxx> wrote: > > > On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote: > > > Finally, after all drivers have a frame size, allow BPF-helper > > > bpf_xdp_adjust_tail() to grow or extend packet size at frame > > > tail. > > > > > > > can you provide a list of usecases for why tail extension is > > necessary > > ? > > Use-cases: > (1) IPsec / XFRM needs a tail extend[1][2]. > (2) DNS-cache replies in XDP. > (3) HA-proxy ALOHA would need it to convert to XDP. > > > and what do you have in mind as immediate use of > > bpf_xdp_adjust_tail() > > ? > > I guess Steffen Klassert's ipsec use-case(1) it the most immediate. > > [1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf > [2] http://vger.kernel.org/netconf2019.html > Thanks ! > > both cover letter and commit messages didn't list any actual use > > case.. > > Sorry about that. > > > > Remember that helper/macro xdp_data_hard_end have reserved some > > > tailroom. Thus, this helper makes sure that the BPF-prog don't > > > have > > > access to this tailroom area. > > > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > > > --- > > > include/uapi/linux/bpf.h | 4 ++-- > > > net/core/filter.c | 18 ++++++++++++++++-- > > > 2 files changed, 18 insertions(+), 4 deletions(-) > > > > [... cut ...] > > > diff --git a/net/core/filter.c b/net/core/filter.c > > > index 7628b947dbc3..4d58a147eed0 100644 > > > --- a/net/core/filter.c > > > +++ b/net/core/filter.c > > > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto > > > bpf_xdp_adjust_head_proto = { > > > > > > BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, > > > offset) > > > { > > > + void *data_hard_end = xdp_data_hard_end(xdp); > > > void *data_end = xdp->data_end + offset; > > > > > > - /* only shrinking is allowed for now. */ > > > - if (unlikely(offset >= 0)) > > > + /* Notice that xdp_data_hard_end have reserved some tailroom */ > > > + if (unlikely(data_end > data_hard_end)) > > > return -EINVAL; > > > > > > > i don't know if i like this approach for couple of reasons. > > > > 1. drivers will provide arbitrary frames_sz, which is normally > > larger > > than mtu, and could be a full page size, for XDP_TX action this can > > be > > problematic if xdp progs will allow oversized packets to get caught > > at > > the driver level.. > > We already check if MTU is exceeded for a specific device when we > redirect into this, see helper xdp_ok_fwd_dev(). For the XDP_TX > case, > I guess some drivers bypass that check, which should be fixed. The > XDP_TX case is IMHO a place where we allow drivers do special > optimizations, thus drivers can choose to do something faster than > calling generic helper xdp_ok_fwd_dev(). > > > 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb > > shinfo > > and it introduces a reverse dependency between xdp buff and skbuff > > > (I'll address this in another mail) > > > both of the above can be solved if the drivers provided the max > > allowed frame size, already accounting for mtu and shinfo when > > setting > > xdp_buff.frame_sz at the driver level. > > It seems we look at the problem from two different angles. You have > the drivers perspective, while I have the network stacks perspective > (the XDP_PASS case). The mlx5 driver treats XDP as a special case, > by > hiding or confining xdp_buff to functions fairly deep in the > call-stack. My goal is different (moving SKB out of drivers), I see > the xdp_buff/xdp_frame as the main packet object in the drivers, that > gets send up the network stack (after converting to xdp_frame) and > converted into SKB in core-code (yes, there is a long road-ahead). > The > larger tailroom can be used by netstack in SKB-coalesce. > But to achieve a proper model, the drivers must be notified about the size of the tailroom they must preserve, now we are just hardcoding it, where it even doesn't belong. I don't know what the right solution yet. but we are still not there .. once we totally move memory management out of the driver, then we might have a better way to preserve head and tail-room .. > The next step is making xdp_buff (and xdp_frame) multi-buffer aware. > This is why I reserve room for skb_shared_info. I have considered this needs to be carefully crafted.. as we don't want to endup with one more SKB type thing to deal with.. > reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info), > but it got kind of ugly having this in each drivers. > can be done via memory model registration ? > I also considered having drivers setup a direct pointer to > {skb,xdp}_shared_info section in xdp_buff, because will make it more > flexible (for what I imagined Alexander Duyck want). (But we can > still > do/change that later, once we start work in multi-buffer code) > you mean something like xdp->data_tail or xdp->data_hard_end ?