Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2020-04-14 at 14:46 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 9 Apr 2020 03:31:14 +0000
> Saeed Mahameed <saeedm@xxxxxxxxxxxx> wrote:
> 
> > On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> > > Finally, after all drivers have a frame size, allow BPF-helper
> > > bpf_xdp_adjust_tail() to grow or extend packet size at frame
> > > tail.
> > >   
> > 
> > can you provide a list of usecases for why tail extension is
> > necessary
> > ?
> 
> Use-cases:
> (1) IPsec / XFRM needs a tail extend[1][2].
> (2) DNS-cache replies in XDP.
> (3) HA-proxy ALOHA would need it to convert to XDP.
>  
> > and what do you have in mind as immediate use of
> > bpf_xdp_adjust_tail()
> > ? 
> 
> I guess Steffen Klassert's ipsec use-case(1) it the most immediate.
> 
> [1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
> [2] http://vger.kernel.org/netconf2019.html
> 

Thanks !

> > both cover letter and commit messages didn't list any actual use
> > case..
> 
> Sorry about that.
> 
> > > Remember that helper/macro xdp_data_hard_end have reserved some
> > > tailroom.  Thus, this helper makes sure that the BPF-prog don't
> > > have
> > > access to this tailroom area.
> > > 
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> > > ---
> > >  include/uapi/linux/bpf.h |    4 ++--
> > >  net/core/filter.c        |   18 ++++++++++++++++--
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > 
> [... cut ...]
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 7628b947dbc3..4d58a147eed0 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> > > bpf_xdp_adjust_head_proto = {
> > >  
> > >  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int,
> > > offset)
> > >  {
> > > +	void *data_hard_end = xdp_data_hard_end(xdp);
> > >  	void *data_end = xdp->data_end + offset;
> > >  
> > > -	/* only shrinking is allowed for now. */
> > > -	if (unlikely(offset >= 0))
> > > +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> > > +	if (unlikely(data_end > data_hard_end))
> > >  		return -EINVAL;
> > >    
> > 
> > i don't know if i like this approach for couple of reasons.
> > 
> > 1. drivers will provide arbitrary frames_sz, which is normally
> > larger
> > than mtu, and could be a full page size, for XDP_TX action this can
> > be
> > problematic if xdp progs will allow oversized packets to get caught
> > at
> > the driver level..
> 
> We already check if MTU is exceeded for a specific device when we
> redirect into this, see helper xdp_ok_fwd_dev().  For the XDP_TX
> case,
> I guess some drivers bypass that check, which should be fixed. The
> XDP_TX case is IMHO a place where we allow drivers do special
> optimizations, thus drivers can choose to do something faster than
> calling generic helper xdp_ok_fwd_dev().  
>   
> > 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb
> > shinfo
> > and it introduces a reverse dependency between xdp buff and skbuff 
> > 
> (I'll address this in another mail)
> 
> > both of the above can be solved if the drivers provided the max
> > allowed frame size, already accounting for mtu and shinfo when
> > setting
> > xdp_buff.frame_sz at the driver level.
> 
> It seems we look at the problem from two different angles.  You have
> the drivers perspective, while I have the network stacks perspective
> (the XDP_PASS case).  The mlx5 driver treats XDP as a special case,
> by
> hiding or confining xdp_buff to functions fairly deep in the
> call-stack.  My goal is different (moving SKB out of drivers), I see
> the xdp_buff/xdp_frame as the main packet object in the drivers, that
> gets send up the network stack (after converting to xdp_frame) and
> converted into SKB in core-code (yes, there is a long road-ahead).
> The
> larger tailroom can be used by netstack in SKB-coalesce.
> 

But to achieve a proper model, the drivers must be notified about the
size of the tailroom they must preserve, now we are just hardcoding it,
where it even doesn't belong. I don't know what the right solution yet.
but we are still not there .. once we totally move memory management
out of the driver, then we might have a better way to preserve head and
tail-room .. 

> The next step is making xdp_buff (and xdp_frame) multi-buffer aware.
> This is why I reserve room for skb_shared_info.  I have considered

this needs to be carefully crafted.. as we don't want to endup with one
more SKB type thing to deal with.. 


> reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info),
> but it got kind of ugly having this in each drivers.
> 

can be done via memory model registration ?

> I also considered having drivers setup a direct pointer to
> {skb,xdp}_shared_info section in xdp_buff, because will make it more
> flexible (for what I imagined Alexander Duyck want).  (But we can
> still
> do/change that later, once we start work in multi-buffer code)
> 

you mean something like xdp->data_tail or xdp->data_hard_end ?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux