Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler

Yonghong Song <yhs@xxxxxxxx> · Fri, 13 Jan 2023 11:23:51 -0800

On 1/13/23 12:53 AM, Jose E. Marchesi wrote:

On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:

BPF has two documented (non-atomic) memory store instructions:

BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_ST : *(size *) (dst_reg + off) = imm32

Currently LLVM BPF back-end does not emit BPF_ST instruction and does
not allow one to be specified as inline assembly.

Recently I've been exploring ways to port some of the verifier test
cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
and machinery provided in tools/testing/selftests/bpf/test_loader.c
(which should hopefully simplify tests maintenance).
The BPF_ST instruction is popular in these tests: used in 52 of 94 files.

While it is possible to adjust LLVM to only support BPF_ST for inline
assembly blocks it seems a bit wasteful. This patch-set contains a set
of changes to verifier necessary in case when LLVM is allowed to
freely emit BPF_ST instructions (source code is available here [1]).

Would we gate LLVM's emitting of BPF_ST for C code behind some new
cpu=v4? What is the benefit for compiler to start automatically emit
such instructions? Such thinking about logistics, if there isn't much
benefit, as BPF application owner I wouldn't bother enabling this
behavior risking regressions on old kernels that don't have these
changes.

Hmm, GCC happily generates BPF_ST instructions:

    $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
    $ cat foo.s
          .file	"<stdin>"
          .text
          .align	3
          .global	foo
          .type	foo, @function
    foo:
          lddw	%r0,v
          stw	[%r0+0],666
          exit
          .size	foo, .-foo
          .global	v
          .type	v, @object
          .lcomm	v,4,4
          .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"

Been doing that since October 2019, I think before the cpu versioning
mechanism was got in place?

We weren't aware this was problematic.  Does the verifier reject such
instructions?

Interesting, do BPF selftests generated by GCC pass the same way they
do if generated by clang?

I had to do the following changes to the verifier to make the
selftests pass when BPF_ST instruction is allowed for selection:

- patch #1 in this patchset: track values of constants written to
    stack using BPF_ST. Currently these are tracked imprecisely, unlike
    the writes using BPF_STX, e.g.:
         fp[-8] = 42;   currently verifier assumes that
fp[-8]=mmmmmmmm
                     after such instruction, where m stands for "misc",
                     just a note that something is written at fp[-8].
                          r1 = 42;       verifier tracks r1=42 after
this instruction.
      fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.

    So the patch makes both cases equivalent.
    - patch #3 in this patchset: adjusts
verifier.c:convert_ctx_access()
    to operate on BPF_ST alongside BPF_STX.
       Context parameters for some BPF programs types are "fake"
data
    structures. The verifier matches all BPF_STX and BPF_LDX
    instructions that operate on pointers to such contexts and rewrites
    these instructions. It might change an offset or add another layer
    of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
    (This also implies that verifier forbids writes to non-constant
     offsets inside such structures).
        So the patch extends this logic to also handle BPF_ST.
The patch 3 is necessary to land before llvm starts generating 'st'
for ctx access.
That's clear, but I'm missing why patch 1 is necessary.
Sure, it's making the verifier understand scalar spills with 'st' and
makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
What kind of programs fail to be verified when llvm starts generating 'st' ?
Regarind -mcpu=v4.
I think we need to add all of our upcoming instructions as a single flag.
Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
-mcpu=v4 could mean:
- ST
- sign extending loads
- sign extend a register
- 32-bit JA
- proper bswap insns: bswap16, bswap32, bswap64
The sign and 32-bit JA we've discussed earlier.
The bswap was on my wish list forever.
The existing TO_LE, TO_BE insns are really odd from compiler pov.
The compiler should translate bswap IR op into proper bswap insn
just like it does on all cpus.
Maybe add SDIV to -mcpu=v4 as well?

Right, we should add these insns in llvm17 with -mcpu=v4, so we
can keep the number of cpu generations minimum.

How do you plan to encode the sign-extend load instructions?

I guess a possibility would be to use one of the available op-mode for
load instructions that are currently marked as reserved.  For example:

    IMM  = 0b000
    ABS  = 0b001
    IND  = 0b010
    MEM  = 0b011
    SEM = 0b100  <- new

Then we would have the following code fields for sign-extending LDX
instructions:

    op-mode:SEM op-size:{W,H,B,DW} op-class:LDX

Right, this is what I plan to do as well. See my incomplete llvm
prototype here:
  https://reviews.llvm.org/D133464