Re: [PATCH RFC bpf-next 06/20] trait: Replace memmove calls with inline move

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 05/03/2025 15.32, arthur@xxxxxxxxxxxxxxx wrote:
From: Arthur Fabre <afabre@xxxxxxxxxxxxxx>

When inserting or deleting traits, we need to move any subsequent
traits over.

Replace it with an inline implementation to avoid the function call
overhead. This is especially expensive on AMD with SRSO.

In practice we shouldn't have too much data to move around, and we're
naturally limited to 238 bytes max, so a dumb implementation should
hopefully be fast enough.

Jesper Brouer kindly ran benchmarks on real hardware with three configs:
- Intel: E5-1650 v4
- AMD SRSO: 9684X SRSO
- AMD IBPB: 9684X SRSO=IBPB

		Intel	AMD IBPB	AMD SRSO
xdp-trait-get	5.530	3.901		9.188		(ns/op)
xdp-trait-set	7.538	4.941		10.050		(ns/op)
xdp-trait-move	14.245	8.865		14.834		(ns/op)
function call	1.319	1.359		5.703		(ns/op)
indirect call	8.922	6.251		10.329		(ns/op)


I've done extensive *micro* bechmarking documented here:
 - https://github.com/xdp-project/xdp-project/tree/main/areas/hints
 - In traits0X_* files

The latest that corresponds to this patchset is in this file:
- https://github.com/xdp-project/xdp-project/blob/main/areas/hints/traits07_bench-009.org

I've not done XDP_REDIRECT testing, which would likely show the bitfield change in xdp_frame, that Olek pointed out.

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux