On 8/16/23 2:36 AM, Jose E. Marchesi wrote:
On 8/15/23 7:19 AM, Jose E. Marchesi wrote:
Hello.
The selftest progs/verifier_masking.c contains inline assembly code
like:
w1 = 0xffffffff;
The 32-bit immediate of that instruction is signed. Therefore, GAS
complains that the above instruction overflows its field:
/tmp/ccNOXFQy.s:46: Error: signed immediate out of range, shall
fit in 32 bits
The llvm assembler is likely relying on signed overflow for the
above to
work.
Not really.
def _ri_32 : ALU_RI<BPF_ALU, Opc, off,
(outs GPR32:$dst),
(ins GPR32:$src2, i32imm:$imm),
"$dst "#OpcodeStr#" $imm",
[(set GPR32:$dst, (OpNode GPR32:$src2,
i32immSExt32:$imm))]>;
If generating from source, the pattern [(set GPR32:$dst, (OpNode
GPR32:$src2, i32immSExt32:$imm))] so value 0xffffffff is not SExt32
and it won't match and eventually a LDimm_64 insn will be generated.
If by "generating from source" you mean compiling from C, then sure, I
wasn't implying clang was generating `r1 = 0xffffffff' for assigning
that positive value to a register.
But for inline asm, we will have
(outs GPR32:$dst)
(ins GPR32:$src2, i32imm:$imm)
and i32imm is defined as
def i32imm : Operand<i32>;
which is a unsigned 32bit value, so it is recognized properly
and the insn is encoded properly.
We thought the imm32 operand in ALU instructions is signed, not
unsigned. Is it really unsigned??
I am going through all the BPF instructions that get 32-bit, 16-bit and
64-bit immediates, because it seems to me that we may need to
distinguish between two different levels:
- Value encoded in the instruction immediate: interpreted as signed or
as unsigned.
The 'imm' in the instruction is a 32-bit signed insn.
I think we have no dispute here.
- How the assembler interprets a written number for the corresponding
instruction operand: for example, for which instructions the assemler
shall accept 0xfffffffe and 4294967294 and -2 all to denote the same
value, what value is it (negative or positive) or shall it emit an
overflow error.
In llvm, for inline asm, 0xfffffffe, 4294967294 and -2 have the same
4-byte bit-wise encoding, so they will be all encoded the same
0xfffffffe in the actual insn.
The following is an example for x86 target in llvm:
$ cat t.c
int foo() {
int a, b;
asm volatile("movl $0xfffffffe, %0" : "=r"(a) :);
asm volatile("movl $-2, %0" : "=r"(b) :);
return a + b;
}
$ clang -O2 -c t.c
$ llvm-objdump -d t.o
t.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b9 fe ff ff ff movl $0xfffffffe, %ecx
# imm = 0xFFFFFFFE
5: b8 fe ff ff ff movl $0xfffffffe, %eax
# imm = 0xFFFFFFFE
a: 01 c8 addl %ecx, %eax
c: c3 retq
$
Whether it is 0xfffffffe or -2, the insn encoding is the same
and disasm prints out 0xfffffffe.
Will follow up with a summary that hopefully will serve to clarify this.
Using negative numbers to denote masks is ugly and obfuscating (for
non-obvious cases like -1/0xffffffff) so I suggest we introduce a
pseudo-op so we can do:
w1 = %mask(0xffffffff)
I changed above
w1 = 0xffffffff;
to
w1 = %mask(0xffffffff)
and hit the following compilation failure.
progs/verifier_masking.c:54:9: error: invalid % escape in inline
assembly string
53 | asm volatile (" \
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54 | w1 = %mask(0xffffffff); \
| ^
1 error generated.
Do you have documentation what is '%mask' thing?
It doesn't exist.
I am suggesting to add support for that pseudo-op to the BPF assemblers:
both GAS and the llvm BPF assembler.
allowing the assembler to do the right thing (TM) converting and
checking that the mask is valid and not relying on UB.
Thoughts?