Re: Masks and overflow of signed immediates in BPF instructions

Yonghong Song <yonghong.song@xxxxxxxxx> · Thu, 17 Aug 2023 10:14:43 -0700

On 8/17/23 9:23 AM, Yonghong Song wrote:


On 8/17/23 1:01 AM, Jose E. Marchesi wrote:

[...]
In llvm, for inline asm, 0xfffffffe, 4294967294 and -2 have the same
4-byte bit-wise encoding, so they will be all encoded the same
0xfffffffe in the actual insn.

The following is an example for x86 target in llvm:

$ cat t.c
int foo() {
   int a, b;

   asm volatile("movl $0xfffffffe, %0" : "=r"(a) :);
   asm volatile("movl $-2, %0" : "=r"(b) :);
   return a + b;
}
$ clang -O2 -c t.c
$ llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
        0: b9 fe ff ff ff                movl    $0xfffffffe, %ecx #
       imm = 0xFFFFFFFE
        5: b8 fe ff ff ff                movl    $0xfffffffe, %eax #
       imm = 0xFFFFFFFE
        a: 01 c8                         addl    %ecx, %eax
        c: c3                            retq
$

Whether it is 0xfffffffe or -2, the insn encoding is the same
and disasm prints out 0xfffffffe.

Thanks for the explanation.

I have pushed the commit below to binutils that makes GAS match the llvm
assembler behavior regarding constant immediates.  With this patch there
are no more assembler errors when building the kernel bpf selftests.

Great! Thanks.


Note however that there is one pending divergence in the behavior of
both assemblers when facing invalid programs where immediate operands
cannot be represented in the number of bits of the field like in:

   $ cat foo.s
   if r1 > r2 goto 0x3fff1

llvm silently truncates it to 16-bit:

   $ clang -target bpf foo.s
   $ bpf-unkonwn-none-objdump -M pseudoc -dr foo.o
   0000000000000000 <.text>:
      0:    2d 21 f1 ff 00 00 00 00     if r1>r2 goto -15

GAS emits an error instead:

   $ as -mdialect=pseudoc foo.s
   foo.s: Assembler messages:
   foo.s:1: Error: pc-relative offset out of range, shall fit in 16 bits.

(The same happens with 32-bit immediates.)

We think the error is pertinent, and we recommend the llvm assembler to
behave the same way.

Thanks! We will take a look at this issue soon.

A patch like below can issue the warning for the above case:

diff --git a/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp 
b/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
index 420a2aad480a..fca6bf30fb4b 100644
--- a/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
+++ b/llvm/lib/Target/BPF/MCTargetDesc/BPFMCCodeEmitter.cpp
@@ -136,6 +136,12 @@ void BPFMCCodeEmitter::encodeInstruction(const 
MCInst &MI,
     OSE.write<uint16_t>(0);
     OSE.write<uint32_t>(Imm >> 32);
   } else {
+    if (Opcode == BPF::JUGT_rr) {
+      const MCOperand &MO = MI.getOperand(2);
+      int64_t Imm = MO.isImm() ? MO.getImm() : 0;
+      if (Imm > INT16_MAX || Imm < INT16_MIN)
+        report_fatal_error("Branch target out of insn range");
+    }
     // Get instruction encoding and emit it
     uint64_t Value = getBinaryCodeForInstr(MI, Fixups, STI);
     CB.push_back(Value >> 56);

Need to generalize to other related conditional/unconditional
operands. Will have a formal patch for llvm soon.

Thanks.