> OK. Here is another data point that shows the perf report with the same test but CPU mitigations > turned OFF. Here bpf_prog overhead goes down from almost (10.18 + 4.51)% to (3.23 + 1.44%). > > 21.40% ksoftirqd/28 [i40e] [k] i40e_clean_rx_irq_zc > 14.13% xdpsock [i40e] [k] i40e_clean_rx_irq_zc > 8.33% ksoftirqd/28 [kernel.vmlinux] [k] xsk_rcv > 6.09% ksoftirqd/28 [kernel.vmlinux] [k] xdp_do_redirect > 5.19% xdpsock xdpsock [.] main > 3.48% ksoftirqd/28 [kernel.vmlinux] [k] bpf_xdp_redirect_map > 3.23% ksoftirqd/28 bpf_prog_3c8251c7e0fef8db [k] bpf_prog_3c8251c7e0fef8db > > So a major component of the bpf_prog overhead seems to be due to the CPU vulnerability mitigations.
I feel that it's an incorrect conclusion because JIT is not doing any retpolines (because there are no indirect calls in bpf). There should be no difference in bpf_prog runtime with or without mitigations. Also you're running root, so no spectre mitigations either.
This 3% seems like a lot for a function that does few loads that should hit d-cache and one direct call. Please investigate why you're seeing this 10% cpu cost when mitigations are on. perf report/annotate is the best. Also please double check that you're using the latest perf. Since bpf performance analysis was greatly improved several versions ago. I don't think old perf will be showing bogus numbers, but better to run the latest.
Here is perf annotate output for bpf_prog_ with and without mitigations turned ON Using the perf built from the bpf-next tree. perf version 5.3.g4071324a76c1 With mitigations ON ------------------- Samples: 6K of event 'cycles', 4000 Hz, Event count (approx.): 5646512726 bpf_prog_3c8251c7e0fef8db bpf_prog_3c8251c7e0fef8db [Percent: local period] 45.05 push %rbp 0.02 mov %rsp,%rbp 0.03 sub $0x8,%rsp 22.09 push %rbx 7.66 push %r13 1.08 push %r14 1.85 push %r15 0.63 pushq $0x0 1.13 mov 0x28(%rdi),%rsi 0.47 mov 0x8(%rsi),%esi 3.47 mov %esi,-0x4(%rbp) 0.02 movabs $0xffff8ab414a83e00,%rdi 0.90 mov $0x2,%edx 2.85 callq *ffffffffd149fc5f 1.55 and $0x6,%rax test %rax,%rax 1.48 jne 72 mov %rbp,%rsi add $0xfffffffffffffffc,%rsi movabs $0xffff8ab414a83e00,%rdi callq *ffffffffd0e5fd5f mov %rax,%rdi mov $0x2,%eax test %rdi,%rdi je 72 mov -0x4(%rbp),%esi movabs $0xffff8ab414a83e00,%rdi xor %edx,%edx callq *ffffffffd149fc5f 72: pop %rbx pop %r15 1.90 pop %r14 1.93 pop %r13 pop %rbx 3.63 leaveq 2.27 retq With mitigations OFF -------------------- Samples: 2K of event 'cycles', 4000 Hz, Event count (approx.): 1872116166 bpf_prog_3c8251c7e0fef8db bpf_prog_3c8251c7e0fef8db [Percent: local period] 0.15 push %rbp mov %rsp,%rbp 13.79 sub $0x8,%rsp 0.30 push %rbx 0.15 push %r13 0.20 push %r14 14.50 push %r15 0.20 pushq $0x0 mov 0x28(%rdi),%rsi 0.25 mov 0x8(%rsi),%esi 14.37 mov %esi,-0x4(%rbp) 0.25 movabs $0xffff8ea2c673b800,%rdi mov $0x2,%edx 13.60 callq *ffffffffe50c2f38 14.33 and $0x6,%rax test %rax,%rax jne 72 mov %rbp,%rsi add $0xfffffffffffffffc,%rsi movabs $0xffff8ea2c673b800,%rdi callq *ffffffffe4a83038 mov %rax,%rdi mov $0x2,%eax test %rdi,%rdi je 72 mov -0x4(%rbp),%esi movabs $0xffff8ea2c673b800,%rdi xor %edx,%edx callq *ffffffffe50c2f38 72: pop %rbx pop %r15 13.97 pop %r14 0.10 pop %r13 pop %rbx 13.71 leaveq 0.15 retq Do you see any issues with this data? With mitigations ON push %rbp and push %rbx overhead seems to be pretty high. Thanks Sridhar