Willem de Bruijn wrote: > Richard Gobert wrote: >> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, >> iph->id, ...) against all packets in a loop. These flush checks are used in >> all merging UDP and TCP flows. >> >> These checks need to be done only once and only against the found p skb, >> since they only affect flush and not same_flow. >> >> This patch leverages correct network header offsets from the cb for both >> outer and inner network headers - allowing these checks to be done only >> once, in tcp_gro_receive and udp_gro_receive_segment. As a result, >> NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are >> more declarative and contained in inet_gro_flush, thus removing the need >> for flush_id in napi_gro_cb. >> >> This results in less parsing code for non-loop flush tests for TCP and UDP >> flows. >> >> To make sure results are not within noise range - I've made netfilter drop >> all TCP packets, and measured CPU performance in GRO (in this case GRO is >> responsible for about 50% of the CPU utilization). >> >> perf top while replaying 64 parallel IP/TCP streams merging in GRO: >> (gro_receive_network_flush is compiled inline to tcp_gro_receive) >> net-next: >> 6.94% [kernel] [k] inet_gro_receive >> 3.02% [kernel] [k] tcp_gro_receive >> >> patch applied: >> 4.27% [kernel] [k] tcp_gro_receive >> 4.22% [kernel] [k] inet_gro_receive >> >> perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same >> results for any encapsulation, in this case inet_gro_receive is top >> offender in net-next) >> net-next: >> 10.09% [kernel] [k] inet_gro_receive >> 2.08% [kernel] [k] tcp_gro_receive >> >> patch applied: >> 6.97% [kernel] [k] inet_gro_receive >> 3.68% [kernel] [k] tcp_gro_receive >> >> Signed-off-by: Richard Gobert <richardbgobert@xxxxxxxxx> > >> +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2, >> + struct sk_buff *p, bool outer) >> +{ >> + const u32 id = ntohl(*(__be32 *)&iph->id); >> + const u32 id2 = ntohl(*(__be32 *)&iph2->id); >> + const u16 ipid_offset = (id >> 16) - (id2 >> 16); >> + const u16 count = NAPI_GRO_CB(p)->count; >> + const u32 df = id & IP_DF; >> + int flush; >> + >> + /* All fields must match except length and checksum. */ >> + flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF)); >> + >> + if (outer && df) >> + return flush; > > if (flush) > return 1; > > To be able to avoid the two flush | below? > Or to avoid adding a branch > > if (flush | (outer && df)) > return 1; > >> + >> + /* When we receive our second frame we can make a decision on if we >> + * continue this flow as an atomic flow with a fixed ID or if we use >> + * an incrementing ID. >> + */ >> + if (count == 1 && df && !ipid_offset) >> + NAPI_GRO_CB(p)->ip_fixedid = true; >> + >> + if (NAPI_GRO_CB(p)->ip_fixedid && df) >> + return flush | ipid_offset; >> + >> + return flush | (ipid_offset ^ count); > > And then simply > > if (NAPI_GRO_CB(p)->ip_fixedid) > return ipid_offset; > else > return ipid_offset ^ count; > > Since NAPI_GRO_CB(p)->ip_fixedid is only set if DF is set on the first > two segments, and df ^ id2 & IP_DF is tested above, no need to test > that again. > >> +} I like the idea, it is more readable. We just need to return flush, and not 1 to make it correct (since flush could be 0 while outer && df set): if (flush | (outer && df)) return flush; Not setting NAPI_GRO_CB(p)->ip_fixedid when flush==1 is a slight change from the previous behaviour. AFAIU it is ok since it doesn't change GRO logic - p will be flushed from gro_list and NAPI_GRO_CB becomes irrelevant. Removing the DF check is nice, I also think we can avoid a branch while keeping the code readable as follows: return ipid_offset ^ (count * !NAPI_GRO_CB(p)->ip_fixedid);