[v2] bpf: Propose some new instructions for -mcpu=v4

Yonghong Song <yhs@xxxxxxxx> · Sun, 26 Feb 2023 10:30:51 -0800

Over the past, there are some discussions to extend bpf
instruction ISA to accommodate some new use cases or
fix some potential issues. These new instructions will
be included in new cpu flavor -mcpu=v4.

The following are the proposal to add new instructions in 6
different categories. The proposal is a little bit rough.
You can find bpf insn background information in
Documentation/bpf/instruction-set.rst. Compared to previous
proposal (v1) in

https://lore.kernel.org/bpf/01515302-c37d-2ee5-c950-2f556a4caad0@xxxxxxxx/
there are two changes:
  . for sign extend load, removing alu32_mode differentiator
    since alu32_mode is only a compiler asm syntax mechanism in
    this case, and not involved in insn encoding.
  . for sign extend mov, there is no support for sign extend
    moving an imm to a register.

The corresponding llvm implementation is at
    https://reviews.llvm.org/D144829

The following is the proposal details.

SDIV/SMOD (signed div and mod)
==============================

bpf already has unsigned DIV and MOD. They are encoded as

   insn code(4 bits) source(1 bit) instruction class(3 bit) off(16 bits)
   DIV  0x3          0/1           BPF_ALU/BPF_ALU64        0
   MOD  0x9          0/1           BPF_ALU/BPF_ALU64        0

The current 'code' field only has two value left, 0xe and 0xf.
gcc used these two values (0xe and 0xf) for SDIV and SMOD.
But using these two values takes up all 'code' space and makes
future extension hard.

Here, I propose to encode SDIV/SMOD like below:

   insn code(4 bits) source(1 bit) instruction class(3 bit) off(16 bits)
   DIV  0x3          0/1           BPF_ALU/BPF_ALU64        1
   MOD  0x9          0/1           BPF_ALU/BPF_ALU64        1

Basically, we reuse the same 'code' value but changing 'off' from 0 to 1
to indicate signed div/mod.

Sign extend load
================

Currently llvm generated normal load instructions are encoded like below.

   mode(3 bits)      size(2 bits)    instruction class(3 bits)
   BPF_MEM (0x3)     8/16/32/64      BPF_LDX

For mode, existing used values are 0x0, 0x1, 0x2, 0x3, 0x6.
The proposal is to use mod value 0x4 to encode sign extend loads.

   mode(3 bits)      size(2 bits)    instruction class(3 bits)
   BPF_SMEM (0x4)    8/16/32         BPF_LDX

Sign extend register mov
========================

Current BPF_MOV insn is encoded as
   insn code(4 bits) source(1 bit) instruction class(3 bit) off(16 bits)
   MOV  0xb          0/1           BPF_ALU/BPF_ALU64        0

Let us support sign extended move insn as defined below:

   insn code(4 bits) source(1 bit) instruction class(3 bit) off(16 bits)
   MOVS 0xb          1             BPF_ALU                  8/16
   MOVS 0xb          1             BPF_ALU64                8/16/32

In the above sign extended mov instruction, 'off' represents the 'size'.
For example, if BPF_ALU class, and 'off' is 8, which means sign
extend a 8-bit value (in register) to a 32-bit value. If BPF_ALU64 class,
the same 8-bit value will sign extend to a 64-bit value.

32-bit JA
=========

Currently, the whole range of operations with BPF_JMP32/BPF_JMP insn are
implemented like below

   ========  =====  =========================  ============
   code      value  description                notes
   ========  =====  =========================  ============
   BPF_JA    0x00   PC += off                  BPF_JMP only
   BPF_JEQ   0x10   PC += off if dst == src
   BPF_JGT   0x20   PC += off if dst > src     unsigned
   BPF_JGE   0x30   PC += off if dst >= src    unsigned
   BPF_JSET  0x40   PC += off if dst & src
   BPF_JNE   0x50   PC += off if dst != src
   BPF_JSGT  0x60   PC += off if dst > src     signed
   BPF_JSGE  0x70   PC += off if dst >= src    signed
   BPF_CALL  0x80   function call
   BPF_EXIT  0x90   function / program return  BPF_JMP only
   BPF_JLT   0xa0   PC += off if dst < src     unsigned
   BPF_JLE   0xb0   PC += off if dst <= src    unsigned
   BPF_JSLT  0xc0   PC += off if dst < src     signed
   BPF_JSLE  0xd0   PC += off if dst <= src    signed
   ========  =====  =========================  ============

Here the 'off' is 16 bit so the range of jump is [-32768, 32767].
In rare cases, people may have large programs or have loops fully unrolled.
This may cause some jump offset beyond the above range. In current
llvm implementation, wrong code (after truncation) will be generated in
earlier llvm or a fatal error will be generated for recent llvm.

To fix this issue, the following new insn is proposed

   ========  =====  =========================  ============
   code      value  description                notes
   ========  =====  =========================  ============
   BPF_JA    0x00   PC += imm                  BPF_JMP32 only

The way, the jump offset range become [-2^31, 2^31 - 1].

For other jump instructions, e.g., BPF_JEQ, with a jmp offset
beyond [-32768, 32767]. It can be simulated with BPF_JEQ with
a short range followed by a BPF_JA.

bswap16/32/64
=============

Currently, llvm does not generate bswap16/32/64 properly.
Rather it generates be16/32/64 and le16/32/64 instructions based on
endianness of the current bpf target in compilation.
The existing encode looks below:

   bpf target     insn code source insn_class imm
   big endian     LE   0xd  LE(0)  BPF_ALU    16/32/64
   little endian  BE   0xd  BE(1)  BPF_ALU    16/32/64

LE insn will do swap if the running target is big endian.
BE insn will do swap if the running target is little endian.
See kernel/bpf/core.c for details.

The new bswap instruction will have the following encoding:
   insn   code source insn_class imm
   BSWAP  0xd  0      BPF_ALU64  16/32/64

The BSWAP insn will be swap unconditionally.

ST
==

The kernel has already supported BPF_ST insn like below,

   mode(3 bits)      size(2 bits)    instruction class(3 bits)
   BPF_MEM (0x3)     8/16/32/64      BPF_ST

The semantics is:
   *(size *) (dst_reg + off) = imm32
LLVM just needs to implement this instruction under -mcpu=v4. looks
like gcc can already generate this instruction.