On Wed, 30 Oct 2024 18:23:26 -0600 Caleb Sander Mateos wrote: > In a heavy TCP workload, mlx5e_handle_rx_dim() consumes 3% of CPU time, > 94% of which is attributed to the first push instruction to copy > dim_sample on the stack for the call to net_dim(): Change itself looks fine, so we can apply, but this seems surprising. Are you sure this is not just some measurement problem? Do you see 3% higher PPS with this change applied?