Richard Gobert wrote: > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > iph->id, ...) against all packets in a loop. These flush checks are used in > all merging UDP and TCP flows. > > These checks need to be done only once and only against the found p skb, > since they only affect flush and not same_flow. > > This patch leverages correct network header offsets from the cb for both > outer and inner network headers - allowing these checks to be done only > once, in tcp_gro_receive and udp_gro_receive_segment. As a result, > NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are > more declarative and contained in inet_gro_flush, thus removing the need > for flush_id in napi_gro_cb. > > This results in less parsing code for non-loop flush tests for TCP and UDP > flows. > > To make sure results are not within noise range - I've made netfilter drop > all TCP packets, and measured CPU performance in GRO (in this case GRO is > responsible for about 50% of the CPU utilization). > > perf top while replaying 64 parallel IP/TCP streams merging in GRO: > (gro_network_flush is compiled inline to tcp_gro_receive) > net-next: > 6.94% [kernel] [k] inet_gro_receive > 3.02% [kernel] [k] tcp_gro_receive > > patch applied: > 4.27% [kernel] [k] tcp_gro_receive > 4.22% [kernel] [k] inet_gro_receive > > perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same > results for any encapsulation, in this case inet_gro_receive is top > offender in net-next) > net-next: > 10.09% [kernel] [k] inet_gro_receive > 2.08% [kernel] [k] tcp_gro_receive > > patch applied: > 6.97% [kernel] [k] inet_gro_receive > 3.68% [kernel] [k] tcp_gro_receive Thanks for getting the additional numbers. The savings are not huge. But +1 on the change also because it simplifies this non-obvious logic. It makes sense to separate flow matching and flush logic. Btw please include Alexander Duyck in the Cc: of this series. > +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, > + struct sk_buff *p, bool outer) > +{ > + const u32 id = ntohl(*(__be32 *)&iph->id); > + const u32 id2 = ntohl(*(__be32 *)&iph2->id); > + const u16 flush_id = (id >> 16) - (id2 >> 16); > + const u16 count = NAPI_GRO_CB(p)->count; > + const u32 df = id & IP_DF; > + u32 is_atomic; > + int flush; > + > + /* All fields must match except length and checksum. */ > + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); > + > + if (outer && df) > + return flush; Does the fixed id logic apply equally to inner and outer IPv4? > + > + /* When we receive our second frame we can make a decision on if we > + * continue this flow as an atomic flow with a fixed ID or if we use > + * an incrementing ID. > + */ > + NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0); > + is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1; > + > + return flush | (flush_id ^ (count & is_atomic)); This is a good time to consider making this logical more obvious. First off, the flush check can be part of the outer && df above, as flush is not modified after. Subjective, but I find the following more readable, and not worth saving a few branches. if (count == 1 && df && !flush_id) NAPI_GRO_CB(p)->is_atomic = true; ip_fixedid_matches = NAPI_GRO_CB(p)->is_atomic ^ df; ipid_offset_matches = ipid_offset - count; return ip_fixedid_matches & ipid_offset_matches; Have to be a bit careful about types. Have not checked that in detail. And while nitpicking: ipid_offset may be a more descriptive variable name than flush_id, and ip_fixedid than is_atomic. If changing those does not result in a lot of code churn. > +} > + > +static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2) > +{ > + /* <Version:4><Traffic_Class:8><Flow_Label:20> */ > + __be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2; > + > + /* Flush if Traffic Class fields are different. */ > + return !!((first_word & htonl(0x0FF00000)) | > + (__force __be32)(iph->hop_limit ^ iph2->hop_limit)); > +} > + > +static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off) > +{ > + const bool encap_mark = NAPI_GRO_CB(p)->encap_mark; Is this correct when udp_gro_complete clears this for tunnels? > + int flush = 0; > + int i; > + > + for (i = 0; i <= encap_mark; i++) { > + const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i]; > + const void *nh = th - diff; > + const void *nh2 = th2 - diff; > + > + if (((struct iphdr *)nh)->version == 6) > + flush |= ipv6_gro_flush(nh, nh2); > + else > + flush |= inet_gro_flush(nh, nh2, p, i != encap_mark); > + } Maybe slightly better for branch prediction, and more obvious, if creating a helper function __gro_network_flush and calling __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[0]) if (NAPI_GRO_CB(p)->encap_mark) __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[1]) > + > + return flush; > +} > + > int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb); >