Re: [RFC bpf-next 0/5] Support for BPF_ST instruction in LLVM C compiler

"Jose E. Marchesi" <jose.marchesi@xxxxxxxxxx> · Fri, 13 Jan 2023 09:53:01 +0100

> On 1/12/23 2:27 PM, Alexei Starovoitov wrote:
>> On Thu, Jan 05, 2023 at 02:07:05PM +0200, Eduard Zingerman wrote:
>>> On Thu, 2023-01-05 at 11:06 +0100, Jose E. Marchesi wrote:
>>>>> On Sat, Dec 31, 2022 at 8:31 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>>>>>>
>>>>>> BPF has two documented (non-atomic) memory store instructions:
>>>>>>
>>>>>> BPF_STX: *(size *) (dst_reg + off) = src_reg
>>>>>> BPF_ST : *(size *) (dst_reg + off) = imm32
>>>>>>
>>>>>> Currently LLVM BPF back-end does not emit BPF_ST instruction and does
>>>>>> not allow one to be specified as inline assembly.
>>>>>>
>>>>>> Recently I've been exploring ways to port some of the verifier test
>>>>>> cases from tools/testing/selftests/bpf/verifier/*.c to use inline assembly
>>>>>> and machinery provided in tools/testing/selftests/bpf/test_loader.c
>>>>>> (which should hopefully simplify tests maintenance).
>>>>>> The BPF_ST instruction is popular in these tests: used in 52 of 94 files.
>>>>>>
>>>>>> While it is possible to adjust LLVM to only support BPF_ST for inline
>>>>>> assembly blocks it seems a bit wasteful. This patch-set contains a set
>>>>>> of changes to verifier necessary in case when LLVM is allowed to
>>>>>> freely emit BPF_ST instructions (source code is available here [1]).
>>>>>
>>>>> Would we gate LLVM's emitting of BPF_ST for C code behind some new
>>>>> cpu=v4? What is the benefit for compiler to start automatically emit
>>>>> such instructions? Such thinking about logistics, if there isn't much
>>>>> benefit, as BPF application owner I wouldn't bother enabling this
>>>>> behavior risking regressions on old kernels that don't have these
>>>>> changes.
>>>>
>>>> Hmm, GCC happily generates BPF_ST instructions:
>>>>
>>>>    $ echo 'int v; void foo () {  v = 666; }' | bpf-unknown-none-gcc -O2 -xc -S -o foo.s -
>>>>    $ cat foo.s
>>>>          .file	"<stdin>"
>>>>          .text
>>>>          .align	3
>>>>          .global	foo
>>>>          .type	foo, @function
>>>>    foo:
>>>>          lddw	%r0,v
>>>>          stw	[%r0+0],666
>>>>          exit
>>>>          .size	foo, .-foo
>>>>          .global	v
>>>>          .type	v, @object
>>>>          .lcomm	v,4,4
>>>>          .ident	"GCC: (GNU) 12.0.0 20211206 (experimental)"
>>>>
>>>> Been doing that since October 2019, I think before the cpu versioning
>>>> mechanism was got in place?
>>>>
>>>> We weren't aware this was problematic.  Does the verifier reject such
>>>> instructions?
>>>
>>> Interesting, do BPF selftests generated by GCC pass the same way they
>>> do if generated by clang?
>>>
>>> I had to do the following changes to the verifier to make the
>>> selftests pass when BPF_ST instruction is allowed for selection:
>>>
>>> - patch #1 in this patchset: track values of constants written to
>>>    stack using BPF_ST. Currently these are tracked imprecisely, unlike
>>>    the writes using BPF_STX, e.g.:
>>>         fp[-8] = 42;   currently verifier assumes that
>>> fp[-8]=mmmmmmmm
>>>                     after such instruction, where m stands for "misc",
>>>                     just a note that something is written at fp[-8].
>>>                          r1 = 42;       verifier tracks r1=42 after
>>> this instruction.
>>>      fp[-8] = r1;   verifier tracks fp[-8]=42 after this instruction.
>>>
>>>    So the patch makes both cases equivalent.
>>>    - patch #3 in this patchset: adjusts
>>> verifier.c:convert_ctx_access()
>>>    to operate on BPF_ST alongside BPF_STX.
>>>       Context parameters for some BPF programs types are "fake"
>>> data
>>>    structures. The verifier matches all BPF_STX and BPF_LDX
>>>    instructions that operate on pointers to such contexts and rewrites
>>>    these instructions. It might change an offset or add another layer
>>>    of indirection, etc. E.g. see filter.c:bpf_convert_ctx_access().
>>>    (This also implies that verifier forbids writes to non-constant
>>>     offsets inside such structures).
>>>        So the patch extends this logic to also handle BPF_ST.
>> The patch 3 is necessary to land before llvm starts generating 'st'
>> for ctx access.
>> That's clear, but I'm missing why patch 1 is necessary.
>> Sure, it's making the verifier understand scalar spills with 'st' and
>> makes 'st' equivalent to 'stx', but I'm missing why it's necessary.
>> What kind of programs fail to be verified when llvm starts generating 'st' ?
>> Regarind -mcpu=v4.
>> I think we need to add all of our upcoming instructions as a single flag.
>> Otherwise we'll have -mcpu=v5,v6,v7 and full combinations of them.
>> -mcpu=v4 could mean:
>> - ST
>> - sign extending loads
>> - sign extend a register
>> - 32-bit JA
>> - proper bswap insns: bswap16, bswap32, bswap64
>> The sign and 32-bit JA we've discussed earlier.
>> The bswap was on my wish list forever.
>> The existing TO_LE, TO_BE insns are really odd from compiler pov.
>> The compiler should translate bswap IR op into proper bswap insn
>> just like it does on all cpus.
>> Maybe add SDIV to -mcpu=v4 as well?
>
> Right, we should add these insns in llvm17 with -mcpu=v4, so we
> can keep the number of cpu generations minimum.

How do you plan to encode the sign-extend load instructions?

I guess a possibility would be to use one of the available op-mode for
load instructions that are currently marked as reserved.  For example:

   IMM  = 0b000
   ABS  = 0b001
   IND  = 0b010
   MEM  = 0b011
   SEM = 0b100  <- new

Then we would have the following code fields for sign-extending LDX
instructions:

   op-mode:SEM op-size:{W,H,B,DW} op-class:LDX