Eric Dumazet wrote: > > My intuition is that this patch has a high cost for normal GRO processing. > SW-GRO is already a bottleneck on ARM cores in smart NICS. > > I would suggest instead using parameters to give both the nhoff and thoff values > this would avoid many conditionals in the fast path. > > -> > > INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int > nhoff, int thoff) > { > const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)(skb->data + nhoff); > struct udphdr *uh = (struct udphdr *)(skb->data + thoff); > ... > } > > INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int > nhoff, int thoff) > { > const struct ipv6hdr *iph = (const struct ipv6hdr *)(skb->data + nhoff); > struct tcphdr *th = (struct tcphdr *)(skb->data + thoff); > > Why storing in skb fields things that really could be propagated more > efficiently as function parameters ? Hi Eric, Thanks for the review! I agree, the conditionals could be a problem and are actually not needed. The third commit in this patch series introduces an optimisation for ipv6/ipv4 using the correct {inner_}network_header. We can remove the conditionals; I thought about multiple ways to do so. First, remove the conditional in skb_gro_network_offset: static inline int skb_gro_network_offset(const struct sk_buff *skb) { const u32 mask = NAPI_GRO_CB(skb)->encap_mark - 1; return (skb_network_offset(skb) & mask) | (skb_inner_network_offset(skb) & ~mask); } And for the conditionals in {inet,ipv6}_gro_receive I thought about two ideas. The first is to move set_inner_network_header to encapsulation gro functions like ipip_gro_receive, this way there's one less write (in comparison to main) in these functions: static struct sk_buff *ipip_gro_receive(struct list_head *head, struct sk_buff *skb) { ... NAPI_GRO_CB(skb)->encap_mark = 1; skb_set_inner_network_header(skb, skb_gro_offset(skb)); The second way is to always write to inner_network_header: INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, struct sk_buff *skb) { ... skb_set_inner_network_header(skb, off); ... What do you think is better? I think the 1st is more beneficial for the fast path. We could then use the {inner_}network_header separation to optimise the receive path, such as in the 3rd commit in this patch series. Regards, Richard