Re: [PATCH net-next v8 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment

Richard Gobert <richardbgobert@xxxxxxxxx> · Tue, 7 May 2024 18:18:18 +0200

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> {inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
>> iph->id, ...) against all packets in a loop. These flush checks are used in
>> all merging UDP and TCP flows.
>>
>> These checks need to be done only once and only against the found p skb,
>> since they only affect flush and not same_flow.
>>
>> This patch leverages correct network header offsets from the cb for both
>> outer and inner network headers - allowing these checks to be done only
>> once, in tcp_gro_receive and udp_gro_receive_segment. As a result,
>> NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are
>> more declarative and contained in inet_gro_flush, thus removing the need
>> for flush_id in napi_gro_cb.
>>
>> This results in less parsing code for non-loop flush tests for TCP and UDP
>> flows.
>>
>> To make sure results are not within noise range - I've made netfilter drop
>> all TCP packets, and measured CPU performance in GRO (in this case GRO is
>> responsible for about 50% of the CPU utilization).
>>
>> perf top while replaying 64 parallel IP/TCP streams merging in GRO:
>> (gro_network_flush is compiled inline to tcp_gro_receive)
>> net-next:
>>         6.94% [kernel] [k] inet_gro_receive
>>         3.02% [kernel] [k] tcp_gro_receive
>>
>> patch applied:
>>         4.27% [kernel] [k] tcp_gro_receive
>>         4.22% [kernel] [k] inet_gro_receive
>>
>> perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
>> results for any encapsulation, in this case inet_gro_receive is top
>> offender in net-next)
>> net-next:
>>         10.09% [kernel] [k] inet_gro_receive
>>         2.08% [kernel] [k] tcp_gro_receive
>>
>> patch applied:
>>         6.97% [kernel] [k] inet_gro_receive
>>         3.68% [kernel] [k] tcp_gro_receive
> 
> Thanks for getting the additional numbers. The savings are not huge.
> 
> But +1 on the change also because it simplifies this non-obvious
> logic. It makes sense to separate flow matching and flush logic.
> 
> Btw please include Alexander Duyck in the Cc: of this series. 

Thanks, will do that when I re-post.

>> +static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2,
>> +				 struct sk_buff *p, bool outer)
>> +{
>> +	const u32 id = ntohl(*(__be32 *)&iph->id);
>> +	const u32 id2 = ntohl(*(__be32 *)&iph2->id);
>> +	const u16 flush_id = (id >> 16) - (id2 >> 16);
>> +	const u16 count = NAPI_GRO_CB(p)->count;
>> +	const u32 df = id & IP_DF;
>> +	u32 is_atomic;
>> +	int flush;
>> +
>> +	/* All fields must match except length and checksum. */
>> +	flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF));
>> +
>> +	if (outer && df)
>> +		return flush;
> 
> Does the fixed id logic apply equally to inner and outer IPv4?
> 

Fixed id logic is not applied equally to inner and outer IPv4. innermost
IDs are checked, but outer IPv4 IDs are not checked at all if DF is set.
This is the current logic in the code and this patch preserves it. To my
understanding this is explained as intentional by the original commit author
(20160407223218.11142.26592.stgit@ahduyck-xeon-server)

Alexander - could you maybe elaborate further?

>> +
>> +	/* When we receive our second frame we can make a decision on if we
>> +	 * continue this flow as an atomic flow with a fixed ID or if we use
>> +	 * an incrementing ID.
>> +	 */
>> +	NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0);
>> +	is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1;
>> +
>> +	return flush | (flush_id ^ (count & is_atomic));
> 
> This is a good time to consider making this logical more obvious.
> 
> First off, the flush check can be part of the outer && df above, as
> flush is not modified after.
> 
> Subjective, but I find the following more readable, and not worth
> saving a few branches.
> 
>         if (count == 1 && df && !flush_id)
>                 NAPI_GRO_CB(p)->is_atomic = true;
> 
> 	ip_fixedid_matches = NAPI_GRO_CB(p)->is_atomic ^ df;
> 	ipid_offset_matches = ipid_offset - count;
> 
> 	return ip_fixedid_matches & ipid_offset_matches;
> 
> Have to be a bit careful about types. Have not checked that in detail.
> 

ip_fixedid_matches should also account for checking whether flush_id is 0
or not, if we're going for readability I'd suggest checking "which check"
should be done (fixedid or offset) and do only the appropriate one.

	if (count == 1 && df && !ipid_offset)
		NAPI_GRO_CB(p)->is_atomic = true;

	if (NAPI_GRO_CB(p)->is_atomic && df)
		return flush | ipid_offset;

	return flush | (ipid_offset ^ count);

> And while nitpicking:
> ipid_offset may be a more descriptive variable name than flush_id, and
> ip_fixedid  than is_atomic. If changing those does not result in a lot
> of code churn.
> 

I also think is_atomic is not the best name, I'll change both names in the
next patch.

>> +}
>> +
>> +static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2)
>> +{
>> +	/* <Version:4><Traffic_Class:8><Flow_Label:20> */
>> +	__be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
>> +
>> +	/* Flush if Traffic Class fields are different. */
>> +	return !!((first_word & htonl(0x0FF00000)) |
>> +		(__force __be32)(iph->hop_limit ^ iph2->hop_limit));
>> +}
>> +
>> +static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off)
>> +{
>> +	const bool encap_mark = NAPI_GRO_CB(p)->encap_mark;
> 
> Is this correct when udp_gro_complete clears this for tunnels?
> 

gro_network_flush is called in receive flow, so udp_gro_complete cannot be
called after gro_network_flush. I think the function name should be changed to
gro_receive_network_flush.

>> +	int flush = 0;
>> +	int i;
>> +
>> +	for (i = 0; i <= encap_mark; i++) {
>> +		const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i];
>> +		const void *nh = th - diff;
>> +		const void *nh2 = th2 - diff;
>> +
>> +		if (((struct iphdr *)nh)->version == 6)
>> +			flush |= ipv6_gro_flush(nh, nh2);
>> +		else
>> +			flush |= inet_gro_flush(nh, nh2, p, i != encap_mark);
>> +	}
> 
> Maybe slightly better for branch prediction, and more obvious, if
> creating a helper function __gro_network_flush and calling
> 
>     __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[0])
>     if (NAPI_GRO_CB(p)->encap_mark)
>             __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[1])
> 
>> +
>> +	return flush;
>> +}
>> +
>>  int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
>>