On Thu, 2024-01-11 at 19:33 +0100, Jose E. Marchesi wrote: > > > [I have just added a proposal for an agenda item to this week's BPF > > Office Hours so we can discuss about BPF sub-registers and compiler > > constraints, to complement this thread.] > > Notes from the office hours: > > Availability of 32-bit arithmetic instructions: > > (cpu >= v3 AND not disabled with -mno-alu32) OR -malu32 > > Compiler constraints: > > "r" > > 64-bit register (rN) or 32-bit sub-register (wN), based on the mode of > the operand. > > If 32-bit arithmetic available > char, short -> wN and warning > int -> wN > long int -> rN > Else > char, short, int -> rN and warning > long int -> rN > > "w" > > 32-bit sub-register (wN) regardless of the mode of the operand. > > If 32-bit arithmetic available > char, short -> wN and warning > int -> wN > long int -> wN and warning > Else > char, short, int, long int -> wN and warning Hi All, I have a preliminary implementation of agreed logic for "r" and "w" inline asm constraints in LLVM branch [0]: - Constraint "r": | Operand type | Has ALU32 | No ALU32 | |--------------+--------------------+--------------------| | char | warning | warning | | short | warning | warning | | int | use 32-bit "w" reg | warning | | long | use 64-bit "r" reg | use 64-bit "r" reg | - Constraint "w": | Operand type | Has ALU32 | No ALU32 | |--------------+--------------------+----------| | char | warning | warning | | short | warning | warning | | int | use 32-bit "w" reg | warning | | long | warning | warning | The following constraints are not implemented for now: "R", "I", "i", "O. I also have patches, adapting BPF selftests [1], Cilium [2] and Tetragon [3] to compile using both current LLVM main and [0]. After porting, selftests [1] are passing, and there is no verification pass/fail differences when loading [2,3] object files using veristat. The main take-away from the the porting exercise is that all three projects still use CPUv2 fully or in parts of the code-base. The issue with CPUv2 is that 'int' type cannot be passed to input "r" constraint w/o explicit cast, and cannot be passed to output constraint at all. Note that by default integer literals in C code have type 'int', hence passing a constant to "ri" constraint might require cast as well. E.g. changes for input constraints, necessary for CPUv2: - asm("%0 -= %1" : "+r"(off) : "r"(buf->skb->data)); + asm("%0 -= %1" : "+r"(off) : "r"((__u64)buf->skb->data)); E.g. changes for ouput (input/output) constraints, necessary for CPUv2: - int am; + long am; ... asm volatile("%[am] &= 0xffff;\n" ::[am] "+r"(am) Also, selftests use constraint named "g" in one place: #define __sink(expr) asm volatile("" : "+g"(expr)) This constraint is documented in [4] as follows: "g" Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. In [0] I apply to "g" same checks as to "r". This is not fully correct, as it warns about e.g. operand 10 (but not 10L) when built with CPUv2. It looks like some internal Clang interfaces (shared between targets) would need adjustment to allow different semantic action on value vs. constant passed to "g" constraint. The only instance when I had to analyze verifier behavior while porting selftests considers usage of the barrier_var() macro. Test case verif_scale_pyperf600_bpf_loop contains a code sequence similar to following: #define barrier_var(var) asm volatile("" : "+r"(var)) ... void foo(unsigned int i, unsigned char *ptr) { ... barrier_var(i); if (i >= 100) return; ... ptr[i] ...; } Previously such code was translated as follows: w1 = w1 ; sign extension for inline asm operand if w1 > 0x63 goto ... ... r2 += r1 But now sign extension is gone. In the test case the range for 'i' was unknown to verifier prior to comparison and 32-bit comparison only produces range for lower half of the register. Thus verifier reported an error: math between map_value pointer and register with unbounded min value is not allowed Unfortunately, I'm not aware of a simple way in C to force this comparison be done in 64-bits w/o changing 'i' to 64-bit type: InstCombinePass demotes it 32-bit comparison. Hence, I changed the type of 'i' from __u32 to __u64. There are 18 selftest object files with slight differences in code generation when new constraint handling logic of [0] is applied. I'll report on these differences a bit later. Please let me know what you think about impact of the compiler changes on the code base. Thanks, Eduard [0] Updated LLVM https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r [1] selftests https://gist.github.com/eddyz87/276f1ecc51930017dcddbb56e37f57ad [2] Cilium https://gist.github.com/eddyz87/4a485573556012ec730c2de0256a79db Note: this is based upon branch 'libbpf-friendliness' from https://github.com/anakryiko/cilium [3] Tetragon https://gist.github.com/eddyz87/ca9a4b68007c72469307f2cce3f83bb1 [4] https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#index-g-in-constraint