This RFC(!) introduces the BPF dispatcher and xdp_call.h, and it's a mechanism to avoid the retpoline overhead by text-poking/rewriting indirect calls to direct calls. The ideas build on Alexei's V3 of the BPF trampoline work, namely: * Use the existing BPF JIT infrastructure generate code * Use bpf_arch_text_poke() to modify the kernel text To try the series out, you'll need V3 of the BPF trampoline work [1]. The main idea; Each XDP call-site calls the jited dispatch table, instead of an indirect call. The dispatch table calls the XDP programs directly. In pseudo code this be something similar to: unsigned int do_call(struct bpf_prog *prog, struct xdp_buff *xdp) { if (&prog == PROG1) return call_direct_PROG1(xdp); if (&prog == PROG2) return call_direct_PROG2(xdp); return indirect_call(prog, xdp); } The current dispatcher supports four entries. It could support more, but I don't know if it's really practical (...and I was lazy -- more than 4 entries meant moving to >1B Jcc. :-P). The dispatcher is re-generated for each new XDP program/entry. The upper limit of four in this series means that if six i40e netdevs have an XDP program running, the fifth and sixth will be using an indirect call. Now to the performance numbers. I ran this on my 3 GHz Skylake, 64B UDP packets are sent to the i40e at ~40 Mpps. Benchmark: # ./xdp_rxq_info --dev enp134s0f0 --action XDP_DROP 1. Baseline: 26.0 Mpps 2. Dispatcher 1 entry: 35,5 Mpps (+36.5%) 3. Dispatcher 4 enties: 32.9 Mpps (+26.5%) 4. Dispatcher 5 enties: 24.2 Mpps (-6.9%) Scenario 4 is that the benchmark uses the dispatcher, but the table is full. This means that the caller pays for the dispatching *and* the retpoline. Is this a good idea? The performance is nice! Can it be done in a better way? Useful for other BPF programs? I would love your input! Thanks! Björn [1] https://patchwork.ozlabs.org/cover/1191672/ Björn Töpel (4): bpf: teach bpf_arch_text_poke() jumps bpf: introduce BPF dispatcher xdp: introduce xdp_call i40e: start using xdp_call.h arch/x86/net/bpf_jit_comp.c | 130 ++++++++++++- drivers/net/ethernet/intel/i40e/i40e_main.c | 5 + drivers/net/ethernet/intel/i40e/i40e_txrx.c | 5 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 5 +- include/linux/bpf.h | 3 + include/linux/xdp_call.h | 49 +++++ kernel/bpf/Makefile | 1 + kernel/bpf/dispatcher.c | 197 ++++++++++++++++++++ 8 files changed, 388 insertions(+), 7 deletions(-) create mode 100644 include/linux/xdp_call.h create mode 100644 kernel/bpf/dispatcher.c -- 2.20.1