On 7/22/2021 12:10 AM, Paolo Abeni wrote: > Hello, > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: >> On 7/21/2021 9:44 AM, Paolo Abeni wrote: >>> This is a very early draft - in a different world would be >>> replaced by hallway discussion at in-person conference - aimed at >>> outlining some ideas and collect feedback on the overall outlook. >>> There are still bugs to be fixed, more test and benchmark need, etc. >>> >>> There are 3 main goals: >>> - [try to] avoid the overhead for uncommon conditions at GRO time >>> (patches 1-4) >>> - enable backpressure for the veth GRO path (patches 5-6) >>> - reduce the number of cacheline used by the sk_buff lifecycle >>> from 4 to 3, at least in some common scenarios (patches 1,7-9). >>> The idea here is avoid the initialization of some fields and >>> control their validity with a bitmask, as presented by at least >>> Florian and Jesper in the past. >> If I understand correctly, you're creating an optimized case >> which excludes ct, secmark, vlan and UDP tunnel. Is this correct, >> and if so, why those particular fields? What impact will this have >> in the non-optimal (with any of the excluded fields) case? > Thank you for the feedback. You're most welcome. You did request comments. > > There are 2 different relevant points: > > - the GRO stage. > packets carring any of CT, dst, sk or skb_ext will do 2 additional > conditionals per gro_receive WRT the current code. My understanding is > that having any of such field set at GRO receive time is quite > exceptional for real nic. All others packet will do 4 or 5 less > conditionals, and will traverse a little less code. > > - sk_buff lifecycle > * packets carrying vlan and UDP will not see any differences: sk_buff > lifecycle will stil use 4 cachelines, as currently does, and no > additional conditional is introduced. > * packets carring nfct or secmark will see an additional conditional > every time such field is accessed. The number of cacheline used will > still be 4, as in the current code. My understanding is that when such > access happens, there is already a relevant amount of "additional" code > to be executed, the conditional overhead should not be measurable. I'm responsible for some of that "additonal" code. If the secmark is considered to be outside the performance critical data there are changes I would like to make that will substantially improve the performance of that "additional" code that would include a u64 secmark. If use of a secmark is considered indicative of a "slow" path, the rationale for restricting it to u32, that it might impact the "usual" case performance, seems specious. I can't say that I understand all the nuances and implications involved. It does appear that the changes you've suggested could negate the classic argument that requires the u32 secmark. > > Cheers, > > Paolo >