On 05/03/2025 15.32, arthur@xxxxxxxxxxxxxxx wrote:
From: Arthur Fabre <afabre@xxxxxxxxxxxxxx>
When inserting or deleting traits, we need to move any subsequent
traits over.
Replace it with an inline implementation to avoid the function call
overhead. This is especially expensive on AMD with SRSO.
In practice we shouldn't have too much data to move around, and we're
naturally limited to 238 bytes max, so a dumb implementation should
hopefully be fast enough.
Jesper Brouer kindly ran benchmarks on real hardware with three configs:
- Intel: E5-1650 v4
- AMD SRSO: 9684X SRSO
- AMD IBPB: 9684X SRSO=IBPB
Intel AMD IBPB AMD SRSO
xdp-trait-get 5.530 3.901 9.188 (ns/op)
xdp-trait-set 7.538 4.941 10.050 (ns/op)
xdp-trait-move 14.245 8.865 14.834 (ns/op)
function call 1.319 1.359 5.703 (ns/op)
indirect call 8.922 6.251 10.329 (ns/op)
I've done extensive *micro* bechmarking documented here:
- https://github.com/xdp-project/xdp-project/tree/main/areas/hints
- In traits0X_* files
The latest that corresponds to this patchset is in this file:
-
https://github.com/xdp-project/xdp-project/blob/main/areas/hints/traits07_bench-009.org
I've not done XDP_REDIRECT testing, which would likely show the bitfield
change in xdp_frame, that Olek pointed out.
--Jesper