On Fri, Jun 21, 2024 at 4:57 AM Paolo Abeni <pabeni@xxxxxxxxxx> wrote: > > On Thu, 2024-06-20 at 15:19 -0700, Yan Zhai wrote: > > Software GRO is currently controlled by a single switch, i.e. > > > > ethtool -K dev gro on|off > > > > However, this is not always desired. When GRO is enabled, even if the > > kernel cannot GRO certain traffic, it has to run through the GRO receive > > handlers with no benefit. > > > > There are also scenarios that turning off GRO is a requirement. For > > example, our production environment has a scenario that a TC egress hook > > may add multiple encapsulation headers to forwarded skbs for load > > balancing and isolation purpose. The encapsulation is implemented via > > BPF. But the problem arises then: there is no way to properly offload a > > double-encapsulated packet, since skb only has network_header and > > inner_network_header to track one layer of encapsulation, but not two. > > On the other hand, not all the traffic through this device needs double > > encapsulation. But we have to turn off GRO completely for any ingress > > device as a result. > > Could you please add more details WRT this last statement? I'm unsure > if I understand your problem. My guess is as follow: > > Your device receive some traffic, GRO and forward it, and the multiple > encapsulation can happen on such forwarded traffic (since I can't find > almost none of the above your message is mainly a wild guess). > > Assuming I guessed correctly, I think you could solve the problem with > no kernel changes: redirect the to-be-tunneled traffic to some virtual > device and all TX offload on top of it and let the encap happen there. > Let's say we have a netns to implement network functions like DoS/IDS/Load balancing for IP traffic. The netns has a single veth entrance/exit, and a bunch of ip tunnels, GRE/XFRM, to receive and tunnel traffic from customer's private sites. Some of such traffic could be encapsulated to reach services outside of the netns (but on the same server), for example, customers may also want to use our CDN/Caching functionality. The complication here is that we might have to further tunnel traffic to another data center, because the routing is asymmetric so we can receive client traffic from US but the response may come back to our EU data center, and in order to do layer4/layer7 service, we have to make sure those land on the same server. It is true that a device like a veth pair or even netkit could allow the kernel segment GRO packets for us. But this does not sound actually right in terms of design: if we know already some packet path should not be GRO-ed, can we enforce this rather than having to aggregate it then chop it down soon after? For our specific case though, it also becomes a headache for analytics and customer rules that rely on ingress device name, we probably need to pair each tunnel with such a virtual device. There could be hundreds of ipsec tunnels, and that seems to be a substantial overhead for both data path and control plane management. To make this a bit more general, what I'd like to introduce here is: when we know GRO is either problematic or simply not useful (like to some UDP traffic), can we have more control toggle to skip it? thanks Yan > Cheers, > > Paolo >