On 11/30/23 10:06 AM, Jose E. Marchesi wrote:
On 11/30/23 7:13 AM, Jose E. Marchesi wrote:
On 11/29/23 2:08 AM, Jose E. Marchesi wrote:
On 11/28/23 11:23 AM, Jose E. Marchesi wrote:
[During LPC 2023 we talked about improving communication between the GCC
BPF toolchain port and the kernel side. This is the first periodical
report that we plan to publish in the GCC wiki and send to interested
parties. Hopefully this will help.]
GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd
IRC channel: #gccbpf at irc.oftc.net.
Help on using the port: gcc@xxxxxxxxxxx
Patches and/or development discussions: gcc-patches@xxxxxxx
Thanks a lot for detailed report. Really helpful to nail down
issues facing one or both compilers. See comments below for
some mentioned issues.
Assembler
=========
[...]
- In the Pseudo-C syntax register names are not preceded by % characters
nor any other prefix. A consequence of that is that in contexts like
instruction operands, where both register names and expressions
involving symbols are expected, there is no way to disambiguate
between them. GAS was allowing symbols like `w3' or `r5' in syntactic
contexts where no registers were expected, such as in:
r0 = w3 ll ; GAS interpreted w3 as symbol, clang emits error
The clang assembler wasn't allowing that. During LPC we agreed that
the simplest approach is to not allow any symbol to have the same name
than a register, in any context. So we changed GAS so it now doesn't
allow to use register names as symbols in any expression, such as:
r0 = w3 + 1 ll ; This now fails for both GAS and llvm.
r0 = 1 + w3 ll ; NOTE this does not fail with llvm, but it should.
Could you provide a reproducible case above for llvm? llvm does not
support syntax like 'r0 = 1 + w3 ll'. For add, it only supports
'r1 += r2' or 'r1 += 100' syntax.
It is a 128-bit load with an expression. In compiler explorer, clang:
int
foo ()
{
asm volatile ("r1 = 10 + w3 ll");
return 0;
}
I get:
foo: # @foo
r1 = 10+w3 ll
r0 = 0
exit
i.e. `10 + w3' is interpreted as an expression with two operands: the
literal number 10 and a symbol (not a register) `w3'.
If the expression is `w3+10' instead, your parser recognizes the w3 as a
register name and errors out, as expected.
I suppose llvm allows to hook on the expression parser to handle
individual operands. That's how we handled this in GAS.
Thanks for the code. I can reproduce the result with compiler explorer.
The following is the link https://godbolt.org/z/GEGexf1Pj
where I added -grecord-gcc-switches to dump compilation flags
into .s file.
The following is the compiler explorer compilation command line:
/opt/compiler-explorer/clang-trunk-20231129/bin/clang-18 -g -o /app/output.s \
-S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 \
-g -grecord-command-line /app/example.c
I then compile the above C code with
clang -g -S --target=bpf -fcolor-diagnostics -gen-reproducer=off -O2 -g -grecord-command-line t.c
with identical flags.
I tried locally with llvm16/17/18. They all failed compilation since
'r1 = 10+w3 ll' cannot be recognized by the llvm.
We will investigate why llvm18 in compiler explorer compiles
differently from my local build.
I updated git llvm master today and I managed to reproduce locally with:
jemarch@termi:~/gnu/src/llvm-project/llvm/build$ clang --version
clang version 18.0.0 (https://github.com/llvm/llvm-project.git 586986a063ee4b9a7490aac102e103bab121c764)
Target: unknown
Thread model: posix
InstalledDir: /usr/local/bin
$ cat foo.c
int
foo ()
{
asm volatile ("r1 = 10 + w3 ll");
return 0;
}
$ clang -target bpf -c foo.c
$ llvm-objdump -dr foo.o
foo.o: file format elf64-bpf
Disassembly of section .text:
0000000000000000 <foo>:
0: 18 01 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 r1 = 0xa ll
0000000000000000: R_BPF_64_64 w3
2: b7 00 00 00 00 00 00 00 r0 = 0x0
3: 95 00 00 00 00 00 00 00 exit
Could you share the cmake command line options when you build you clang?
My cmake command line looks like
cmake .. -DCMAKE_BUILD_TYPE=Release -G Ninja \
-DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" \
-DLLVM_TARGETS_TO_BUILD="BPF;X86" \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_ZLIB=ON \
-DCMAKE_INSTALL_PREFIX=$PWD/install
and cannot reproduce the issue.
I don't have the original cmake command, I executed it long ago
(rebuilding clang/llvm in my laptop takes three days or more so I do it
incrementally.)
I see this in my CMakeCache.txt:
LLVM_ENABLE_PROJECTS:STRING=clang
LLVM_TARGETS_TO_BUILD:STRING=BPF
LLVM_ENABLE_ASSERTIONS:BOOL=OFF
LLVM_ENABLE_ZLIB:STRING=ON
CMAKE_INSTALL_PREFIX:PATH=/usr/local
Thanks for your cmake command line options. Looks like the reason is due to LLVM_ENABLE_ASSERTIONS=OFF while in my case
LLVM_ENABLE_ASSERTIONS=ON.
The related function in llvm is:
static void printExpr(const MCExpr *Expr, raw_ostream &O) {
#ifndef NDEBUG
const MCSymbolRefExpr *SRE;
if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
else
SRE = dyn_cast<MCSymbolRefExpr>(Expr);
assert(SRE && "Unexpected MCExpr type.");
MCSymbolRefExpr::VariantKind Kind = SRE->getKind();
assert(Kind == MCSymbolRefExpr::VK_None);
#endif
O << *Expr;
}
If LLVM_ENABLE_ASSERTIONS=ON, NDEBUG will not be defined
and 'assert' will actually do assertion.
If LLVM_ENABLE_ASSERTIONS=OFF, NDEBUG will be defined
and 'assert' will be a noop.
That is why ASSERTIONS OFF flag is okay while ASSERTIONS ON
will cause the following error:
$ clang --target=bpf -g -S -O2 t.c
clang: ../lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp:46: void printExpr(const llvm::MCExpr*, llvm::raw_ostream&):
Assertion `SRE && "Unexpected MCExpr type."' failed.
.... stack trace etc. ....
I also tried with my local redhat built clang15 and it didn't produce error either.
$ /bin/clang --target=bpf -g -S -O2 t.c
$ rpm -qf /bin/clang
clang-16.0.6-1.el9.x86_64
Looks like their cmake options does not have LLVM_ENABLE_ASSERTIONS at all
which I assume is OFF. See
https://gitlab.com/redhat/centos-stream/rpms/clang/-/blob/4fcf8241b99430ba239c5461b962fea1f3107a22/clang.spec
clang really does not support this syntax:
r1 = 10 + w3 ll
The following clang patch will emit error regardless of LLVM_ENABLE_ASSERTIONS value.
======
diff --git a/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp b/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
index 15ab55f95e69..c266538bec73 100644
--- a/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
+++ b/llvm/lib/Target/BPF/MCTargetDesc/BPFInstPrinter.cpp
@@ -36,15 +36,16 @@ void BPFInstPrinter::printInst(const MCInst *MI, uint64_t Address,
}
static void printExpr(const MCExpr *Expr, raw_ostream &O) {
-#ifndef NDEBUG
const MCSymbolRefExpr *SRE;
if (const MCBinaryExpr *BE = dyn_cast<MCBinaryExpr>(Expr))
SRE = dyn_cast<MCSymbolRefExpr>(BE->getLHS());
else
SRE = dyn_cast<MCSymbolRefExpr>(Expr);
- assert(SRE && "Unexpected MCExpr type.");
+ if (!SRE)
+ report_fatal_error("Unexpected MCExpr type.");
+#ifndef NDEBUG
MCSymbolRefExpr::VariantKind Kind = SRE->getKind();
assert(Kind == MCSymbolRefExpr::VK_None);
=======