Richard Gobert wrote: > Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: > additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the > complete phase of gro. The functions always return skb->network_header, > which in the case of encapsulated packets at the gro complete phase, is > always set to the innermost L3 of the packet. That means that calling > {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in > gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ > L3/L4 may return an unexpected value. > > This incorrect usage leads to a bug in GRO's UDP socket lookup. > udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These > *_hdr functions return network_header which will point to the innermost L3, > resulting in the wrong offset being used in __udp{4,6}_lib_lookup with > encapsulated packets. > > To fix this issue p_off param is used in *_gro_complete to pass off the > offset of the previous layer. What exactly does this mean? This patch changes the definition of gro_complete to add a thoff alongside the existing "nhoff".. > - int (*gro_complete)(struct sk_buff *skb, int nhoff); > + int (*gro_complete)(struct sk_buff *skb, int nhoff, > + int thoff); .. but also fixes up implementations to interpret the existing argument as a thoff > -INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff) > +INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int nhoff, > + int thoff) > { > - const struct iphdr *iph = ip_hdr(skb); > - struct tcphdr *th = tcp_hdr(skb); > + const struct iphdr *iph = (const struct iphdr *)(skb->data + nhoff); > + struct tcphdr *th = (struct tcphdr *)(skb->data + thoff); But in some cases the new argument is not nhoff but p_off, e.g., > static int geneve_gro_complete(struct sock *sk, struct sk_buff *skb, > - int nhoff) > + int p_off, int nhoff) Really, the argument is the start of the next header, each callback just casts to its expected header (ethhdr, tcphdr, etc.) The only place where we need to pass an extra argument is in udp, because that needs a pointer to the network header right before the transport header pointed to by nhoff. And only due to possible IPv4 options or IPv6 extension headers, we cannot just do + struct udphdr *iph = (struct iphdr *)(skb->data + nhoff - sizeof(*iph)); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); I also do not immediately see an a way to avoid all the boilerplate of a new argument in every callback. Aside from a per_cpu var -- but that is excessive. But it can just be left zero in all callsites, except for inet_gro_complete/ipv6_gro_complete, which pass in nhoff. > > Reproduction example: > > Endpoint configuration example (fou + local address bind) > > # ip fou add port 6666 ipproto 4 > # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip > # ip link set tun1 up > # ip a add 1.1.1.2/24 dev tun1 > > Netperf TCP_STREAM result on net-next before patch is applied: > > net-next main, GRO enabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.28 2.37 > > net-next main, GRO disabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.01 2745.06 > > patch applied, GRO enabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.01 2877.38 > > Fixes: 57c67ff4bd92 ("udp: additional GRO support") This and the previous change should really target net, as they are bug fixes.