Re: Masks and overflow of signed immediates in BPF instructions

Yonghong Song <yonghong.song@xxxxxxxxx> · Wed, 16 Aug 2023 09:22:09 -0700

On 8/16/23 2:36 AM, Jose E. Marchesi wrote:

On 8/15/23 7:19 AM, Jose E. Marchesi wrote:
Hello.
The selftest progs/verifier_masking.c contains inline assembly code
like:
    	w1 = 0xffffffff;
The 32-bit immediate of that instruction is signed.  Therefore, GAS
complains that the above instruction overflows its field:
    /tmp/ccNOXFQy.s:46: Error: signed immediate out of range, shall
fit in 32 bits
The llvm assembler is likely relying on signed overflow for the
above to
work.

Not really.

   def _ri_32 : ALU_RI<BPF_ALU, Opc, off,
                    (outs GPR32:$dst),
                    (ins GPR32:$src2, i32imm:$imm),
                    "$dst "#OpcodeStr#" $imm",
                    [(set GPR32:$dst, (OpNode GPR32:$src2,
                    i32immSExt32:$imm))]>;

If generating from source, the pattern [(set GPR32:$dst, (OpNode
GPR32:$src2, i32immSExt32:$imm))] so value 0xffffffff is not SExt32
and it won't match and eventually a LDimm_64 insn will be generated.

If by "generating from source" you mean compiling from C, then sure, I
wasn't implying clang was generating `r1 = 0xffffffff' for assigning
that positive value to a register.

But for inline asm, we will have
   (outs GPR32:$dst)
   (ins GPR32:$src2, i32imm:$imm)

and i32imm is defined as
   def i32imm : Operand<i32>;
which is a unsigned 32bit value, so it is recognized properly
and the insn is encoded properly.

We thought the imm32 operand in ALU instructions is signed, not
unsigned.  Is it really unsigned??

I am going through all the BPF instructions that get 32-bit, 16-bit and
64-bit immediates, because it seems to me that we may need to
distinguish between two different levels:

- Value encoded in the instruction immediate: interpreted as signed or
   as unsigned.

The 'imm' in the instruction is a 32-bit signed insn.
I think we have no dispute here.

- How the assembler interprets a written number for the corresponding
   instruction operand: for example, for which instructions the assemler
   shall accept 0xfffffffe and 4294967294 and -2 all to denote the same
   value, what value is it (negative or positive) or shall it emit an
   overflow error.

In llvm, for inline asm, 0xfffffffe, 4294967294 and -2 have the same
4-byte bit-wise encoding, so they will be all encoded the same
0xfffffffe in the actual insn.

The following is an example for x86 target in llvm:

$ cat t.c
int foo() {
  int a, b;

  asm volatile("movl $0xfffffffe, %0" : "=r"(a) :);
  asm volatile("movl $-2, %0" : "=r"(b) :);
  return a + b;
}
$ clang -O2 -c t.c
$ llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
       0: b9 fe ff ff ff                movl    $0xfffffffe, %ecx 
# imm = 0xFFFFFFFE
       5: b8 fe ff ff ff                movl    $0xfffffffe, %eax 
# imm = 0xFFFFFFFE
       a: 01 c8                         addl    %ecx, %eax
       c: c3                            retq
$

Whether it is 0xfffffffe or -2, the insn encoding is the same
and disasm prints out 0xfffffffe.

Will follow up with a summary that hopefully will serve to clarify this.

Using negative numbers to denote masks is ugly and obfuscating (for
non-obvious cases like -1/0xffffffff) so I suggest we introduce a
pseudo-op so we can do:
     w1 = %mask(0xffffffff)

I changed above
   w1 = 0xffffffff;
to
   w1 = %mask(0xffffffff)
and hit the following compilation failure.

progs/verifier_masking.c:54:9: error: invalid % escape in inline
assembly string
    53 |         asm volatile ("                                 \
       |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    54 |         w1 = %mask(0xffffffff);                         \
       |                ^
1 error generated.

Do you have documentation what is '%mask' thing?

It doesn't exist.

I am suggesting to add support for that pseudo-op to the BPF assemblers:
both GAS and the llvm BPF assembler.

allowing the assembler to do the right thing (TM) converting and
checking that the mask is valid and not relying on UB.
Thoughts?