On Thu, Sep 24, 2020 at 11:22 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > Port of tail_call_static() helper function from Cilium's BPF code base [0] > to libbpf, so others can easily consume it as well. We've been using this > in production code for some time now. The main idea is that we guarantee > that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the > JITed BPF insns with direct jumps instead of having to fall back to using > expensive retpolines. By using inline asm, we guarantee that the compiler > won't merge the call from different paths with potentially different > content of r2/r3. > > We're also using __throw_build_bug() macro in different places as a neat > trick to trigger compilation errors when compiler does not remove code at > compilation time. This works for the BPF backend as it does not implement > the __builtin_trap(). > > [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035 > > Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > --- > tools/lib/bpf/bpf_helpers.h | 32 ++++++++++++++++++++++++++++++++ > 1 file changed, 32 insertions(+) > > diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h > index 1106777df00b..18b75a4c82e6 100644 > --- a/tools/lib/bpf/bpf_helpers.h > +++ b/tools/lib/bpf/bpf_helpers.h > @@ -53,6 +53,38 @@ > }) > #endif > > +/* > + * Misc useful helper macros > + */ > +#ifndef __throw_build_bug > +# define __throw_build_bug() __builtin_trap() > +#endif this will become part of libbpf stable API, do we want/need to expose it? If we want to expose it, then we should probably provide a better description. But also curious, how is it better than _Static_assert() (see test_cls_redirect.c), which also allows to provide a better error message? > + > +static __always_inline void > +bpf_tail_call_static(void *ctx, const void *map, const __u32 slot) > +{ > + if (!__builtin_constant_p(slot)) > + __throw_build_bug(); > + > + /* > + * Don't gamble, but _guarantee_ that LLVM won't optimize setting > + * r2 and r3 from different paths ending up at the same call insn as > + * otherwise we won't be able to use the jmpq/nopl retpoline-free > + * patching by the x86-64 JIT in the kernel. > + * So the clobbering comment below is completely clear. But this one is less clear without some sort of example situation in which bad things happen. Do you mind providing some pseudo-C example in which the compiler will optimize things in such a way that the tail call patching won't happen? > + * Note on clobber list: we need to stay in-line with BPF calling > + * convention, so even if we don't end up using r0, r4, r5, we need > + * to mark them as clobber so that LLVM doesn't end up using them > + * before / after the call. > + */ > + asm volatile("r1 = %[ctx]\n\t" > + "r2 = %[map]\n\t" > + "r3 = %[slot]\n\t" > + "call 12\n\t" > + :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot) > + : "r0", "r1", "r2", "r3", "r4", "r5"); > +} > + > /* > * Helper structure used by eBPF C program > * to describe BPF map attributes to libbpf loader > -- > 2.21.0 >