Re: asm register constraint. Was: [PATCH v2 bpf-next 2/5] bpf: Introduce "volatile compare" macro

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2024-01-08 at 13:33 -0800, Alexei Starovoitov wrote:
[...]

> Agree that llvm fix [6] is a necessary step, then
> using 'w' in v3/v4 and warn on v1/v2 makes sense too,
> but we have this macro:
> #define barrier_var(var) asm volatile("" : "+r"(var))
> that is used in various bpf production programs.
> tetragon is also a heavy user of inline asm.

I have an llvm version [1] that implements 32-bit/64-bit register
selection and it indeed breaks some code when tested on BPF selftests.
The main pain point is that code base is compiled both with and
without ALU32, so 'r' constraint cannot be used with 'int'
(and constants are 'int' by default), hence the fixes like:

        size_t off = (size_t)buf->head;
-       asm("%0 -= %1" : "+r"(off) : "r"(buf->skb->data));
+       asm("%0 -= %1" : "+r"(off) : "r"((__u64)buf->skb->data));
        return off;

or

 #ifndef bpf_nop_mov
 #define bpf_nop_mov(var) \
-       asm volatile("%[reg]=%[reg]"::[reg]"r"((short)var))
+       asm volatile("%[reg]=%[reg]"::[reg]"r"((unsigned long)var))
 #endif

or

        int save_syn = 1;
        int rv = -1;
        int v = 0;
-       int op;
+       long op;
        ...
        asm volatile (
            "%[op] = *(u32 *)(%[skops] +96)"
            : [op] "+r"(op)
            : [skops] "r"(skops)
            :);

[1] https://github.com/eddyz87/llvm-project/tree/bpf-inline-asm-polymorphic-r

Need a bit more time to share the branch with updates for selftests.

> And, the most importantly, we need a way to go back to old behavior,
> since u32 var; asm("...":: "r"(var)); will now
> allocate "w" register or warn.
>
> What should be the workaround?

The u64 cast is the workaround.

> I've tried:
> u32 var; asm("...":: "r"((u64)var));
>
> https://godbolt.org/z/n4ejvWj7v
>
> and x86/arm64 generate 32-bit truction.
> Sounds like the bpf backend has to do it as well.

Here is godbolt output:

    callq   bar()@PLT
    movl    %eax, %eax ; <-- (u64)var is translated as zero extension
    movq    %rax, %rax ; <-- inline asm block uses 64-bit register

Here is LLVM IR for this example before code generation:

    %call = tail call i64 @bar() #2
    %conv1 = and i64 %call, 4294967295 ; <------- `(u64)var` translation
    tail call void asm sideeffect "mov $0, $0",
              "r,~{dirflag},~{fpsr},~{flags}"(i64 %conv1) #2
    ret void

> We should be doing 'wX=wX' in such case (just like x86)
> instead of <<=32 >>=32.

Opened pull request for this:
https://github.com/llvm/llvm-project/pull/77501

> We probably need some macro (like we did for __BPF_CPU_VERSION__)
> to identify such fixed llvm, so existing users with '(short)'
> workaround and other tricks can detect new vs old compiler.
> 
> Looks like we opened a can of worms.
> Aligning with x86/arm64 makes sense, but the cost of doing
> the right thing is hard to estimate.

Indeed.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux