Re: asm register constraint. Was: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

Eduard Zingerman <eddyz87@xxxxxxxxx> · Mon, 15 Jan 2024 18:33:25 +0200

On Thu, 2024-01-11 at 19:33 +0100, Jose E. Marchesi wrote:
> 
> > [I have just added a proposal for an agenda item to this week's BPF
> >  Office Hours so we can discuss about BPF sub-registers and compiler
> >  constraints, to complement this thread.]
> 
> Notes from the office hours:
> 
> Availability of 32-bit arithmetic instructions:
> 
>   (cpu >= v3 AND not disabled with -mno-alu32) OR -malu32
> 
> Compiler constraints:
> 
>   "r"
> 
>      64-bit register (rN) or 32-bit sub-register (wN), based on the mode of
>      the operand.
> 
>      If 32-bit arithmetic available
>         char, short -> wN and warning
>         int -> wN
>         long int -> rN
>      Else
>         char, short, int -> rN and warning
>         long int -> rN
> 
>   "w"
> 
>      32-bit sub-register (wN) regardless of the mode of the operand.
> 
>      If 32-bit arithmetic available
>        char, short -> wN and warning
>        int -> wN
>        long int -> wN and warning
>      Else
>        char, short, int, long int -> wN and warning

Hi All,

I have a preliminary implementation of agreed logic for "r" and "w"
inline asm constraints in LLVM branch [0]:

- Constraint "r":
| Operand type | Has ALU32          | No ALU32           |
|--------------+--------------------+--------------------|
| char         | warning            | warning            |
| short        | warning            | warning            |
| int          | use 32-bit "w" reg | warning            |
| long         | use 64-bit "r" reg | use 64-bit "r" reg |

- Constraint "w":
| Operand type | Has ALU32          | No ALU32 |
|--------------+--------------------+----------|
| char         | warning            | warning  |
| short        | warning            | warning  |
| int          | use 32-bit "w" reg | warning  |
| long         | warning            | warning  |

The following constraints are not implemented for now:
"R", "I", "i", "O.

I also have patches, adapting BPF selftests [1], Cilium [2] and
Tetragon [3] to compile using both current LLVM main and [0].
After porting, selftests [1] are passing, and there is no verification
pass/fail differences when loading [2,3] object files using veristat.

The main take-away from the the porting exercise is that all three
projects still use CPUv2 fully or in parts of the code-base.

The issue with CPUv2 is that 'int' type cannot be passed to input
"r" constraint w/o explicit cast, and cannot be passed to output
constraint at all. Note that by default integer literals in C code
have type 'int', hence passing a constant to "ri" constraint might
require cast as well.

E.g. changes for input constraints, necessary for CPUv2:

-   asm("%0 -= %1" : "+r"(off) : "r"(buf->skb->data));
+   asm("%0 -= %1" : "+r"(off) : "r"((__u64)buf->skb->data));

E.g. changes for ouput (input/output) constraints, necessary for CPUv2:

-   int am;
+   long am;
    ...
    asm volatile("%[am] &= 0xffff;\n" ::[am] "+r"(am)

Also, selftests use constraint named "g" in one place:

  #define __sink(expr) asm volatile("" : "+g"(expr))

This constraint is documented in [4] as follows:

  "g" Any register, memory or immediate integer operand is allowed,
      except for registers that are not general registers.

In [0] I apply to "g" same checks as to "r".
This is not fully correct, as it warns about e.g. operand 10 (but not 10L)
when built with CPUv2. It looks like some internal Clang interfaces
(shared between targets) would need adjustment to allow different
semantic action on value vs. constant passed to "g" constraint.

The only instance when I had to analyze verifier behavior while
porting selftests considers usage of the barrier_var() macro.
Test case verif_scale_pyperf600_bpf_loop contains a code sequence
similar to following:

    #define barrier_var(var) asm volatile("" : "+r"(var))
    ...
    void foo(unsigned int i, unsigned char *ptr) {
      ...
      barrier_var(i);
      if (i >= 100)
        return;
      ... ptr[i] ...;
    }

Previously such code was translated as follows:

    w1 = w1                ; sign extension for inline asm operand
    if w1 > 0x63 goto ...
    ...
    r2 += r1

But now sign extension is gone.
In the test case the range for 'i' was unknown to verifier prior to
comparison and 32-bit comparison only produces range for lower half of
the register. Thus verifier reported an error:

  math between map_value pointer and register with
  unbounded min value is not allowed

Unfortunately, I'm not aware of a simple way in C to force this
comparison be done in 64-bits w/o changing 'i' to 64-bit type:
InstCombinePass demotes it 32-bit comparison.
Hence, I changed the type of 'i' from __u32 to __u64.

There are 18 selftest object files with slight differences in code
generation when new constraint handling logic of [0] is applied.
I'll report on these differences a bit later.

Please let me know what you think about impact of the compiler changes
on the code base.

Thanks,
Eduard

[0] Updated LLVM
    https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r
[1] selftests
    https://gist.github.com/eddyz87/276f1ecc51930017dcddbb56e37f57ad
[2] Cilium
    https://gist.github.com/eddyz87/4a485573556012ec730c2de0256a79db
    Note: this is based upon branch 'libbpf-friendliness'
          from https://github.com/anakryiko/cilium
[3] Tetragon
    https://gist.github.com/eddyz87/ca9a4b68007c72469307f2cce3f83bb1
[4] https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#index-g-in-constraint