On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote: > {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, > iph->id, ...) against all packets in a loop. These flush checks are used > currently in all tcp flows and in some UDP flows in GRO. > > These checks need to be done only once and only against the found p skb, > since they only affect flush and not same_flow. > > Leveraging the previous commit in the series, in which correct network > header offsets are saved for both outer and inner network headers - > allowing these checks to be done only once, in tcp_gro_receive and > udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at > all. In addition, flush_id checks are more declarative and contained in > inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. > > This results in less parsing code for UDP flows and non-loop flush tests > for TCP flows. > > To make sure results are not within noise range - I've made netfilter drop > all TCP packets, and measured CPU performance in GRO (in this case GRO is > responsible for about 50% of the CPU utilization). > > L3 flush/flush_id checks are not relevant to UDP connections where > skb_gro_receive_list is called. The only code change relevant to this flow > is inet_gro_receive. The rest of the code parsing this flow stays the > same. > > All concurrent connections tested are with the same ip srcaddr and > dstaddr. > > perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow): > net-next: > 3.03% [kernel] [k] inet_gro_receive > > patch applied: > 2.78% [kernel] [k] inet_gro_receive Why there are no figures for udp_gro_receive_segment()/gro_network_flush() here? Also you should be able to observer a very high amount of CPU usage by GRO even with TCP with very high speed links, keeping the BH/GRO on a CPU and the user-space/data copy on a different one (or using rx zero copy). Thanks, Paolo