On Wed, Oct 30, 2024 at 06:23:26PM -0600, Caleb Sander Mateos wrote: > net_dim() is currently passed a struct dim_sample argument by value. > struct dim_sample is 24 bytes. Since this is greater 16 bytes, x86-64 > passes it on the stack. All callers have already initialized dim_sample > on the stack, so passing it by value requires pushing a duplicated copy > to the stack. Either witing to the stack and immediately reading it, or > perhaps dereferencing addresses relative to the stack pointer in a chain > of push instructions, seems to perform quite poorly. > > In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, > 94% of which is attributed to the first push instruction to copy > dim_sample on the stack for the call to net_dim(): > // Call ktime_get() > 0.26 |4ead2: call 4ead7 <mlx5e_handle_rx_dim+0x47> > // Pass the address of struct dim in %rdi > |4ead7: lea 0x3d0(%rbx),%rdi > // Set dim_sample.pkt_ctr > |4eade: mov %r13d,0x8(%rsp) > // Set dim_sample.byte_ctr > |4eae3: mov %r12d,0xc(%rsp) > // Set dim_sample.event_ctr > 0.15 |4eae8: mov %bp,0x10(%rsp) > // Duplicate dim_sample on the stack > 94.16 |4eaed: push 0x10(%rsp) > 2.79 |4eaf1: push 0x10(%rsp) > 0.07 |4eaf5: push %rax > // Call net_dim() > 0.21 |4eaf6: call 4eafb <mlx5e_handle_rx_dim+0x6b> > > To allow the caller to reuse the struct dim_sample already on the stack, > pass the struct dim_sample by reference to net_dim(). > > Signed-off-by: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx> > --- Reviewed-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>