> On 1/12/23 2:27 PM, Alexei Starovoitov wrote: >> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote: >>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote: >>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: >>>>>> >>>>>> BPF has two documented (non-atomic) memory store instructions: >>>>>> >>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg >>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32 >>>>>> >>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does >>>>>> not allow one to be specified as inline assembly. >>>>>> >>>>>> Recently I've been exploring ways to port some of the verifier test >>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly >>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c >>>>>> (which should hopefully simplify tests maintenance). >>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files. >>>>>> >>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline >>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set >>>>>> of changes to verifier necessary in case when LLVM is allowed to >>>>>> freely emit BPF_ST instructions (source code is available here [1]). >>>>> >>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new >>>>> cpu=v4? What is the benefit for compiler to start automatically emit >>>>> such instructions? Such thinking about logistics, if there isn't much >>>>> benefit, as BPF application owner I wouldn't bother enabling this >>>>> behavior risking regressions on old kernels that don't have these >>>>> changes. >>>> >>>> Hmm, GCC happily generates BPF_ST instructions: >>>> >>>> $ echo 'int v; void foo () { v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s - >>>> $ cat foo.s >>>> .file "<stdin>" >>>> .text >>>> .align 3 >>>> .global foo >>>> .type foo, @function >>>> foo: >>>> lddw %r0,v >>>> stw [%r0+0],666 >>>> exit >>>> .size foo, .-foo >>>> .global v >>>> .type v, @object >>>> .lcomm v,4,4 >>>> .ident "GCC: (GNU) 12.0.0 20211206 (experimental)" >>>> >>>> Been doing that since October 2019, I think before the cpu versioning >>>> mechanism was got in place? >>>> >>>> We weren't aware this was problematic. Does the verifier reject such >>>> instructions? >>> >>> Interesting, do BPF selftests generated by GCC pass the same way they >>> do if generated by clang? >>> >>> I had to do the following changes to the verifier to make the >>> selftests pass when BPF_ST instruction is allowed for selection: >>> >>> - patch #1 in this patchset: track values of constants written to >>> stack using BPF_ST. Currently these are tracked imprecisely, unlike >>> the writes using BPF_STX, e.g.: >>> fp[-8] = 42; currently verifier assumes that >>> fp[-8]=mmmmmmmm >>> after such instruction, where m stands for "misc", >>> just a note that something is written at fp[-8]. >>> r1 = 42; verifier tracks r1=42 after >>> this instruction. >>> fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. >>> >>> So the patch makes both cases equivalent. >>> - patch #3 in this patchset: adjusts >>> verifier.c:convert_ctx_access() >>> to operate on BPF_ST alongside BPF_STX. >>> Context parameters for some BPF programs types are "fake" >>> data >>> structures. The verifier matches all BPF_STX and BPF_LDX >>> instructions that operate on pointers to such contexts and rewrites >>> these instructions. It might change an offset or add another layer >>> of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access(). >>> (This also implies that verifier forbids writes to non-constant >>> offsets inside such structures). >>> So the patch extends this logic to also handle BPF_ST. >> The patch 3 is necessary to land before llvm starts generating 'st' >> for ctx access. >> That's clear, but I'm missing why patch 1 is necessary. >> Sure, it's making the verifier understand scalar spills with 'st' and >> makes 'st' equivalent to 'stx', but I'm missing why it's necessary. >> What kind of programs fail to be verified when llvm starts generating 'st' ? >> Regarind -mcpu=v4. >> I think we need to add all of our upcoming instructions as a single flag. >> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them. >> -mcpu=v4 could mean: >> - ST >> - sign extending loads >> - sign extend a register >> - 32-bit JA >> - proper bswap insns: bswap16, bswap32, bswap64 >> The sign and 32-bit JA we've discussed earlier. >> The bswap was on my wish list forever. >> The existing TO_LE, TO_BE insns are really odd from compiler pov. >> The compiler should translate bswap IR op into proper bswap insn >> just like it does on all cpus. >> Maybe add SDIV to -mcpu=v4 as well? > > Right, we should add these insns in llvm17 with -mcpu=v4, so we > can keep the number of cpu generations minimum. How do you plan to encode the sign-extend load instructions? I guess a possibility would be to use one of the available op-mode for load instructions that are currently marked as reserved. For example: IMM = 0b000 ABS = 0b001 IND = 0b010 MEM = 0b011 SEM = 0b100 <- new Then we would have the following code fields for sign-extending LDX instructions: op-mode:SEM op-size:{W,H,B,DW} op-class:LDX