> On 11/29/23 2:08 AM, Jose E. Marchesi wrote: >>> On 11/28/23 11:23 AM, Jose E. Marchesi wrote: >>>> [During LPC 2023 we talked about improving communication between the GCC >>>> BPF toolchain port and the kernel side. This is the first periodical >>>> report that we plan to publish in the GCC wiki and send to interested >>>> parties. Hopefully this will help.] >>>> >>>> GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd >>>> IRC channel: #gccbpf at irc.oftc.net. >>>> Help on using the port: gcc@xxxxxxxxxxx >>>> Patches and/or development discussions: gcc-patches@xxxxxxx >>> Thanks a lot for detailed report. Really helpful to nail down >>> issues facing one or both compilers. See comments below for >>> some mentioned issues. >>> >>>> Assembler >>>> ========= >>> [...] >>> >>>> - In the Pseudo-C syntax register names are not preceded by % characters >>>> nor any other prefix. A consequence of that is that in contexts like >>>> instruction operands, where both register names and expressions >>>> involving symbols are expected, there is no way to disambiguate >>>> between them. GAS was allowing symbols like `w3' or `r5' in syntactic >>>> contexts where no registers were expected, such as in: >>>> >>>> r0 = w3 ll ; GAS interpreted w3 as symbol, clang emits error >>>> >>>> The clang assembler wasn't allowing that. During LPC we agreed that >>>> the simplest approach is to not allow any symbol to have the same name >>>> than a register, in any context. So we changed GAS so it now doesn't >>>> allow to use register names as symbols in any expression, such as: >>>> >>>> r0 = w3 + 1 ll ; This now fails for both GAS and llvm. >>>> r0 = 1 + w3 ll ; NOTE this does not fail with llvm, but it should. >>> Could you provide a reproducible case above for llvm? llvm does not >>> support syntax like 'r0 = 1 + w3 ll'. For add, it only supports >>> 'r1 += r2' or 'r1 += 100' syntax. >> It is a 128-bit load with an expression. In compiler explorer, clang: >> >> int >> foo () >> { >> asm volatile ("r1 = 10 + w3 ll"); >> return 0; >> } >> >> I get: >> >> foo: # @foo >> r1 = 10+w3 ll >> r0 = 0 >> exit >> >> i.e. `10 + w3' is interpreted as an expression with two operands: the >> literal number 10 and a symbol (not a register) `w3'. >> >> If the expression is `w3+10' instead, your parser recognizes the w3 as a >> register name and errors out, as expected. >> >> I suppose llvm allows to hook on the expression parser to handle >> individual operands. That's how we handled this in GAS. > > Thanks for the code. I can reproduce the result with compiler explorer. > The following is the link https://godbolt.org/z/GEGexf1Pj > where I added -grecord-gcc-switches to dump compilation flags > into .s file. > > The following is the compiler explorer compilation command line: > /opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \ > -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \ > -g -grecord-command-line /app/example.c > > I then compile the above C code with > clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c > with identical flags. > > I tried locally with llvm16/17/18. They all failed compilation since > 'r1 = 10+w3 ll' cannot be recognized by the llvm. > We will investigate why llvm18 in compiler explorer compiles > differently from my local build. I updated git llvm master today and I managed to reproduce locally with: jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764) Target: unknown Thread model: posix InstalledDir: /usr/local/bin $ cat foo.c int foo () { asm volatile ("r1 = 10 + w3 ll"); return 0; } $ clang -target bpf -c foo.c $ llvm-objdump -dr foo.o foo.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: 18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 r1 = 0xa ll 0000000000000000: R_BPF_64_64 w3 2: b7 00 00 00 00 00 00 00 r0 = 0x0 3: 95 00 00 00 00 00 00 00 exit