Hello, On Fri, 9 Jun 2023, Terin Stock wrote: > When using encapsulation the original packet's headers are copied to the > inner headers. This preserves the space for an inner mac header, which > is not used by the inner payloads for the encapsulation types supported > by IPVS. If a packet is using GUE or GRE encapsulation and needs to be > segmented, flow can be passed to __skb_udp_tunnel_segment() which > calculates a negative tunnel header length. A negative tunnel header > length causes pskb_may_pull() to fail, dropping the packet. > > This can be observed by attaching probes to ip_vs_in_hook(), > __dev_queue_xmit(), and __skb_udp_tunnel_segment(): > > perf probe --add '__dev_queue_xmit skb->inner_mac_header \ > skb->inner_network_header skb->mac_header skb->network_header' > perf probe --add '__skb_udp_tunnel_segment:7 tnl_hlen' > perf probe -m ip_vs --add 'ip_vs_in_hook skb->inner_mac_header \ > skb->inner_network_header skb->mac_header skb->network_header' > > These probes the headers and tunnel header length for packets which > traverse the IPVS encapsulation path. A TCP packet can be forced into > the segmentation path by being smaller than a calculated clamped MSS, > but larger than the advertised MSS. > > probe:ip_vs_in_hook: inner_mac_header=0x0 inner_network_header=0x0 mac_header=0x44 network_header=0x52 > probe:ip_vs_in_hook: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 > probe:dev_queue_xmit: inner_mac_header=0x44 inner_network_header=0x52 mac_header=0x44 network_header=0x32 > probe:__skb_udp_tunnel_segment_L7: tnl_hlen=-2 > > When using veth-based encapsulation, the interfaces are set to be > mac-less, which does not preserve space for an inner mac header. This > prevents this issue from occurring. > > In our real-world testing of sending a 32KB file we observed operation > time increasing from ~75ms for veth-based encapsulation to over 1.5s > using IPVS encapsulation due to retries from dropped packets. > > This changeset modifies the packet on the encapsulation path in > ip_vs_tunnel_xmit() and ip_vs_tunnel_xmit_v6() to remove the inner mac > header offset. This fixes UDP segmentation for both encapsulation types, > and corrects the inner headers for any IPIP flows that may use it. > > Fixes: 84c0d5e96f3a ("ipvs: allow tunneling with gue encapsulation") > Signed-off-by: Terin Stock <terin@xxxxxxxxxxxxxx> Looks good to me for nf/net tree, thanks! Acked-by: Julian Anastasov <ja@xxxxxx> > --- > net/netfilter/ipvs/ip_vs_xmit.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c > index c7652da78c88..9193e109e6b3 100644 > --- a/net/netfilter/ipvs/ip_vs_xmit.c > +++ b/net/netfilter/ipvs/ip_vs_xmit.c > @@ -1207,6 +1207,7 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, > skb->transport_header = skb->network_header; > > skb_set_inner_ipproto(skb, next_protocol); > + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb)); > > if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { > bool check = false; > @@ -1349,6 +1350,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, > skb->transport_header = skb->network_header; > > skb_set_inner_ipproto(skb, next_protocol); > + skb_set_inner_mac_header(skb, skb_inner_network_offset(skb)); > > if (tun_type == IP_VS_CONN_F_TUNNEL_TYPE_GUE) { > bool check = false; > -- > 2.40.1 Regards -- Julian Anastasov <ja@xxxxxx>