Richard Gobert wrote: > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > iph->id, ...) against all packets in a loop. These flush checks are used in > all merging UDP and TCP flows. > > These checks need to be done only once and only against the found p skb, > since they only affect flush and not same_flow. > > This patch leverages correct network header offsets from the cb for both > outer and inner network headers - allowing these checks to be done only > once, in tcp_gro_receive and udp_gro_receive_segment. As a result, > NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are > more declarative and contained in inet_gro_flush, thus removing the need > for flush_id in napi_gro_cb. > > This results in less parsing code for non-loop flush tests for TCP and UDP > flows. > > To make sure results are not within noise range - I've made netfilter drop > all TCP packets, and measured CPU performance in GRO (in this case GRO is > responsible for about 50% of the CPU utilization). > > perf top while replaying 64 parallel IP/TCP streams merging in GRO: > (gro_receive_network_flush is compiled inline to tcp_gro_receive) > net-next: > 6.94% [kernel] [k] inet_gro_receive > 3.02% [kernel] [k] tcp_gro_receive > > patch applied: > 4.27% [kernel] [k] tcp_gro_receive > 4.22% [kernel] [k] inet_gro_receive > > perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same > results for any encapsulation, in this case inet_gro_receive is top > offender in net-next) > net-next: > 10.09% [kernel] [k] inet_gro_receive > 2.08% [kernel] [k] tcp_gro_receive > > patch applied: > 6.97% [kernel] [k] inet_gro_receive > 3.68% [kernel] [k] tcp_gro_receive > > Signed-off-by: Richard Gobert <richardbgobert@xxxxxxxxx> > +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, > + struct sk_buff *p, bool outer) > +{ > + const u32 id = ntohl(*(__be32 *)&iph->id); > + const u32 id2 = ntohl(*(__be32 *)&iph2->id); > + const u16 ipid_offset = (id >> 16) - (id2 >> 16); > + const u16 count = NAPI_GRO_CB(p)->count; > + const u32 df = id & IP_DF; > + int flush; > + > + /* All fields must match except length and checksum. */ > + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); > + > + if (outer && df) > + return flush; if (flush) return 1; To be able to avoid the two flush | below? Or to avoid adding a branch if (flush | (outer && df)) return 1; > + > + /* When we receive our second frame we can make a decision on if we > + * continue this flow as an atomic flow with a fixed ID or if we use > + * an incrementing ID. > + */ > + if (count == 1 && df && !ipid_offset) > + NAPI_GRO_CB(p)->ip_fixedid = true; > + > + if (NAPI_GRO_CB(p)->ip_fixedid && df) > + return flush | ipid_offset; > + > + return flush | (ipid_offset ^ count); And then simply if (NAPI_GRO_CB(p)->ip_fixedid) return ipid_offset; else return ipid_offset ^ count; Since NAPI_GRO_CB(p)->ip_fixedid is only set if DF is set on the first two segments, and df ^ id2 & IP_DF is tested above, no need to test that again. > +}