Re: [PATCH bpf-next v2 1/2] bpf,x64: use shrx/sarx/shlx when available

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 27 Sep 2022 11:45:33 +0200

On 9/27/22 2:38 AM, Jie Meng wrote:
On Mon, Sep 26, 2022 at 09:16:41PM +0200, Daniel Borkmann wrote:
On 9/24/22 2:32 AM, Jie Meng wrote:
Instead of shr/sar/shl that implicitly use %cl, emit their more flexible
alternatives provided in BMI2

Signed-off-by: Jie Meng <jmeng@xxxxxx>
---
   arch/x86/net/bpf_jit_comp.c | 53 +++++++++++++++++++++++++++++++++++++
   1 file changed, 53 insertions(+)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index ae89f4143eb4..2227d81a5e44 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -889,6 +889,35 @@ static void emit_nops(u8 **pprog, int len)
   	*pprog = prog;
   }
+static void emit_3vex(u8 **pprog, bool r, bool x, bool b, u8 m,
+		      bool w, u8 src_reg2, bool l, u8 p)
+{
+	u8 *prog = *pprog;
+	u8 b0 = 0xc4, b1, b2;
+	u8 src2 = reg2hex[src_reg2];
+
+	if (is_ereg(src_reg2))
+		src2 |= 1 << 3;
+
+	/*
+	 *    7                           0
+	 *  +---+---+---+---+---+---+---+---+
+	 *  |~R |~X |~B |         m         |
+	 *  +---+---+---+---+---+---+---+---+
+	 */
+	b1 = (!r << 7) | (!x << 6) | (!b << 5) | (m & 0x1f);
+	/*
+	 *    7                           0
+	 *  +---+---+---+---+---+---+---+---+
+	 *  | W |     ~vvvv     | L |   pp  |
+	 *  +---+---+---+---+---+---+---+---+
+	 */
+	b2 = (w << 7) | ((~src2 & 0xf) << 3) | (l << 2) | (p & 3);
+
+	EMIT3(b0, b1, b2);
+	*pprog = prog;
+}
+
   #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp)))
   static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
@@ -1135,7 +1164,31 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image
   		case BPF_ALU64 | BPF_LSH | BPF_X:
   		case BPF_ALU64 | BPF_RSH | BPF_X:
   		case BPF_ALU64 | BPF_ARSH | BPF_X:
+			if (boot_cpu_has(X86_FEATURE_BMI2) && src_reg != BPF_REG_4) {
+				/* shrx/sarx/shlx dst_reg, dst_reg, src_reg */
+				bool r = is_ereg(dst_reg);
+				u8 m = 2; /* escape code 0f38 */
+				bool w = (BPF_CLASS(insn->code) == BPF_ALU64);

Looks like you just pass all the above vars into emit_3vex(), so why not hide them
there directly? The only thing really needed is p (and should probably be called op?),
so you just pass emit_3vex(&prog, op, dst_reg, src_reg)..

emit_3vex() is to encode the 3 bytes VEX prefix and exposes all the
information that can be encoded. The wish is to make it reusable for future
instructions that may use VEX so I deliberately avoided hardcoding anything that is specific to a particular instruction.

This bit of context was missing from your description, but I also think it's okay
to do the refactor when the time comes where this gets reused. (You could also just
hide these in an emit_shift which calls emit_3vex or such.. and explain your rationale
in the commit message.)

please also improve the
commit message a bit, e.g. before/after disasm + opcode hexdump example (e.g. extract
from bpftool dump) would be nice and also add a sentence about the BPF_REG_4 limitation
case.

Sure I can do that but would like to know your opinion about emit_3vex()
first.
  
+				u8 p;
+
+				switch (BPF_OP(insn->code)) {
+				case BPF_LSH:
+					p = 1; /* prefix 0x66 */
+					break;
+				case BPF_RSH:
+					p = 3; /* prefix 0xf2 */
+					break;
+				case BPF_ARSH:
+					p = 2; /* prefix 0xf3 */
+					break;
+				}
+
+				emit_3vex(&prog, r, false, r, m,
+					  w, src_reg, false, p);
+				EMIT2(0xf7, add_2reg(0xC0, dst_reg, dst_reg));
+				break;
+			}
   			/* Check for bad case when dst_reg == rcx */
   			if (dst_reg == BPF_REG_4) {
   				/* mov r11, dst_reg */