Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler

Yonghong Song <yhs@xxxxxxxx> · Fri, 13 Jan 2023 00:02:42 -0800

On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:

BPF has two documented (non-atomic) memory store instructions:

BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_ST : *(size *) (dst_reg + off) = imm32

Currently LLVM BPF back-end does not emit BPF_ST instruction and does
not allow one to be specified as inline assembly.

Recently I've been exploring ways to port some of the verifier test
cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
and machinery provided in tools/testing/selftests/bpf/test_loader.c
(which should hopefully simplify tests maintenance).
The BPF_ST instruction is popular in these tests: used in 52 of 94 files.

While it is possible to adjust LLVM to only support BPF_ST for inline
assembly blocks it seems a bit wasteful. This patch-set contains a set
of changes to verifier necessary in case when LLVM is allowed to
freely emit BPF_ST instructions (source code is available here [1]).

Would we gate LLVM's emitting of BPF_ST for C code behind some new
cpu=v4? What is the benefit for compiler to start automatically emit
such instructions? Such thinking about logistics, if there isn't much
benefit, as BPF application owner I wouldn't bother enabling this
behavior risking regressions on old kernels that don't have these
changes.

Hmm, GCC happily generates BPF_ST instructions:

   $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
   $ cat foo.s
         .file	"<stdin>"
         .text
         .align	3
         .global	foo
         .type	foo, @function
   foo:
         lddw	%r0,v
         stw	[%r0+0],666
         exit
         .size	foo, .-foo
         .global	v
         .type	v, @object
         .lcomm	v,4,4
         .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"

Been doing that since October 2019, I think before the cpu versioning
mechanism was got in place?

We weren't aware this was problematic.  Does the verifier reject such
instructions?

Interesting, do BPF selftests generated by GCC pass the same way they
do if generated by clang?

I had to do the following changes to the verifier to make the
selftests pass when BPF_ST instruction is allowed for selection:

- patch #1 in this patchset: track values of constants written to
   stack using BPF_ST. Currently these are tracked imprecisely, unlike
   the writes using BPF_STX, e.g.:

     fp[-8] = 42;   currently verifier assumes that fp[-8]=mmmmmmmm
                    after such instruction, where m stands for "misc",
                    just a note that something is written at fp[-8].

     r1 = 42;       verifier tracks r1=42 after this instruction.
     fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.

   So the patch makes both cases equivalent.

- patch #3 in this patchset: adjusts verifier.c:convert_ctx_access()
   to operate on BPF_ST alongside BPF_STX.

   Context parameters for some BPF programs types are "fake" data
   structures. The verifier matches all BPF_STX and BPF_LDX
   instructions that operate on pointers to such contexts and rewrites
   these instructions. It might change an offset or add another layer
   of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
   (This also implies that verifier forbids writes to non-constant
    offsets inside such structures).

   So the patch extends this logic to also handle BPF_ST.

The patch 3 is necessary to land before llvm starts generating 'st' for ctx access.
That's clear, but I'm missing why patch 1 is necessary.
Sure, it's making the verifier understand scalar spills with 'st' and
makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
What kind of programs fail to be verified when llvm starts generating 'st' ?

Regarind -mcpu=v4.
I think we need to add all of our upcoming instructions as a single flag.
Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.

-mcpu=v4 could mean:
- ST
- sign extending loads
- sign extend a register
- 32-bit JA
- proper bswap insns: bswap16, bswap32, bswap64

The sign and 32-bit JA we've discussed earlier.
The bswap was on my wish list forever.
The existing TO_LE, TO_BE insns are really odd from compiler pov.
The compiler should translate bswap IR op into proper bswap insn
just like it does on all cpus.

Maybe add SDIV to -mcpu=v4 as well?

Right, we should add these insns in llvm17 with -mcpu=v4, so we
can keep the number of cpu generations minimum.