Re: [PATCH bpf-next 4/6] bpf, libbpf: add bpf_tail_call_static helper for bpf programs

Yonghong Song <yhs@xxxxxx> · Thu, 24 Sep 2020 23:13:05 -0700

On 9/24/20 11:21 AM, Daniel Borkmann wrote:
Port of tail_call_static() helper function from Cilium's BPF code base [0]
to libbpf, so others can easily consume it as well. We've been using this
in production code for some time now. The main idea is that we guarantee
that the kernel's BPF infrastructure and JIT (here: x86_64) can patch the
JITed BPF insns with direct jumps instead of having to fall back to using
expensive retpolines. By using inline asm, we guarantee that the compiler
won't merge the call from different paths with potentially different
content of r2/r3.

We're also using __throw_build_bug() macro in different places as a neat
trick to trigger compilation errors when compiler does not remove code at
compilation time. This works for the BPF backend as it does not implement
the __builtin_trap().

   [0] https://github.com/cilium/cilium/commit/f5537c26020d5297b70936c6b7d03a1e412a1035

Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
---
  tools/lib/bpf/bpf_helpers.h | 32 ++++++++++++++++++++++++++++++++
  1 file changed, 32 insertions(+)

diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index 1106777df00b..18b75a4c82e6 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -53,6 +53,38 @@
  	})
  #endif
  
+/*
+ * Misc useful helper macros
+ */
+#ifndef __throw_build_bug
+# define __throw_build_bug()	__builtin_trap()

Just some general comments below. The patch itself is fine to me.

I guess we will never implement a 'trap' insn? The only possible
use I know is for failed CORE relocation, which currently encoded
as an illegal call insn.

+#endif
+
+static __always_inline void
+bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
+{
+	if (!__builtin_constant_p(slot))
+		__throw_build_bug();
+
+	/*
+	 * Don't gamble, but _guarantee_ that LLVM won't optimize setting
+	 * r2 and r3 from different paths ending up at the same call insn as
+	 * otherwise we won't be able to use the jmpq/nopl retpoline-free
+	 * patching by the x86-64 JIT in the kernel.
+	 *
+	 * Note on clobber list: we need to stay in-line with BPF calling
+	 * convention, so even if we don't end up using r0, r4, r5, we need
+	 * to mark them as clobber so that LLVM doesn't end up using them
+	 * before / after the call.
+	 */
+	asm volatile("r1 = %[ctx]\n\t"
+		     "r2 = %[map]\n\t"
+		     "r3 = %[slot]\n\t"
+		     "call 12\n\t"
+		     :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot)
+		     : "r0", "r1", "r2", "r3", "r4", "r5");

Clever scheme to enforce a constant!

This particular case is to avoid sinking common code.
We have a similar use case in https://reviews.llvm.org/D87153
which is to avoid PHI merging of two relocation globals like
   phi = [reloc_global1, reloc_global2]  // reachable from two paths
   ... phi ...

The solution is to insert bpf internal __builtin
   val = __builtin_bpf_passthrough(seq_num, val)
in proper places to prevent sinking common code.

I guess in this example, we could prevent this with
compiler inserting passthrough builtin's like
   ...
   ret = bpf_tail_call(ctx, map, slot)
   ret = __builtin_bpf_passthrough(seq_num, ret)
   ...

If in the future, such a builtin is proved useful for bpf program
writers as well. we could just define
   val = __builtin_bpf_passthrough(val)
and internally, it will be transformed to
   val = llvm.bpf.passthrough(seq_num, val)

+}
+
  /*
   * Helper structure used by eBPF C program
   * to describe BPF map attributes to libbpf loader