[During LPC 2023 we talked about improving communication between the GCC BPF toolchain port and the kernel side. This is the first periodical report that we plan to publish in the GCC wiki and send to interested parties. Hopefully this will help.] GCC wiki page for the port: https://gcc.gnu.org/wiki/BPFBackEnd IRC channel: #gccbpf at irc.oftc.net. Help on using the port: gcc@xxxxxxxxxxx Patches and/or development discussions: gcc-patches@xxxxxxx Assembler ========= - The BPF assembler was sometimes generating spurious symbols. The problem was that supporting the pseudo-C assembly syntax for BPF makes it impossible to use the traditional technique of hashing on mnemonics. Instead, we are forced to attempt parsing entries in our opcodes table until some instruction template matches. In some cases this was causing the parser to incorrectly parse part of an instruction opcode as an expression, which led to the creation of a new undefined symbol. David Faust installed a fix for this upstream: https://sourceware.org/pipermail/binutils/2023-November/130668.html - gas: change meaning of ; in the BPF assembler. The clang assembler interprets semicolons as a statement/directive separator. In the GNU BPF assembler that character was being interpreted as the beginning of a line comment, as it is usual in assembly languages. We detected this discrepancy with snippets like: asm volatile (" \ if r1 >= 0 goto l0_%=; \ r0 = 1; \ r0 += 2; \ l0_%=: exit; \ " ::: __clobber_all); We installed a patch upstream that makes GAS to behave like the clang assembler when interpreting semicolons in the assembly programs: Jose E. Marchesi https://sourceware.org/pipermail/binutils/2023-November/130867.html The simulator tests have been updated accordingly: Jose E. Marchesi https://sourceware.org/pipermail/gdb-patches/2023-November/204581.html - In the Pseudo-C syntax register names are not preceded by % characters nor any other prefix. A consequence of that is that in contexts like instruction operands, where both register names and expressions involving symbols are expected, there is no way to disambiguate between them. GAS was allowing symbols like `w3' or `r5' in syntactic contexts where no registers were expected, such as in: r0 = w3 ll ; GAS interpreted w3 as symbol, clang emits error The clang assembler wasn't allowing that. During LPC we agreed that the simplest approach is to not allow any symbol to have the same name than a register, in any context. So we changed GAS so it now doesn't allow to use register names as symbols in any expression, such as: r0 = w3 + 1 ll ; This now fails for both GAS and llvm. r0 = 1 + w3 ll ; NOTE this does not fail with llvm, but it should. We installed a patch in GAS for this. Jose E. Marchesi https://sourceware.org/pipermail/binutils/2023-November/130684.html - Cupertino Miranda fixed a GAS bug in the parsing of registers in pseudo-c syntax mode: https://sourceware.org/pipermail/binutils/2023-November/130732.html Compiler ======== - Remove bpf-helpers.h. Now that we are finally able to use the kernel provided bpf_helpers.h file and associated machinery, there is no longer need to distribute our own version. Jose E. Marchesi https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638226.html - Restore BPF build, always_inline in libgcc Jose E. Marchesi https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637948.html - Fix expected regexp in gcc.target/bpf/ldxdw.c test Jose E. Marchesi https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635892.html - Fix pseudoc-c asm emitted for *mulsidi3_zeroextend Jose E. Marchesi https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635896.html - Corrected condition in core_mark_as_access index. Cupertino Miranda https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636389.html - Delayed the removal of the parser enum plugin handler. Cupertino Miranda https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636388.html - Force inlining __builtin_memcmp upto data sizes of 1024 bytes. Cupertino Miranda https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636390.html - Emit errors for libcalls and builtin-generated libcalls, like clang does. Cupertino Miranda https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638117.html - GCC was emitting funcall external declarations corresponding to attempted but eventually discarded code. This happened for example when GCC tried some particular code that got discarded because there was another more performance alternative. This was a problem with the BPF instruction set <= v3, because of lack of signed division. This is now fixed upstream. Jose E. Marchesi BZ 109253 https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638143.html - Indu Bhagat is investigating a BTF generation problem that is resulting in non-anonymous FUNC_PROTO entries, which are not allowed in BTF and rejected by the BPF loader. This apparently happens when functions get inlined. Pending Patches for bpf-next ============================ These are the current patches we still have to submit to bpf@vger for bpf-next. We are in the process of testing them: - bpf: add more options for gcc-bpf to selftests/bpf/Makefile This patch passes the following extra options to BPF_GCC in GCC_BPF_BUILD_RULE: -masm=pseudoc -mco-re -Wno-unknown-pragmas -Wno-unused-variable -Wno-error=attributes -Wno-error=address-of-packed-member -Wno-compare-distinct-pointer-types -fno-strict-aliasing Most of them disable interpreting certain warnings as errors. Code like: #define __imm_insn(name, expr) [name]"i"(*(long *)&(expr)) where `expr' is something like a pointer to a bpf_insn, requires disabling strict aliasing, which is activated by default with -O2 in GCC. - bpf: use r constraint instead of p constraint This was discussed in bpf@vger and it was decided that we would stop using the "p" constraint in the BPF kernel selftests. That constraint is not really meant to be used externally to the compiler. https://lore.kernel.org/bpf/87edkbnq14.fsf@xxxxxxxxxx/ - bpf_core_read.h: GCC specific macro for preserve_enum_value This patch adds a version of the bpf_core_enum_value macro to be used by GCC. The implementations of CO-RE built-ins in clang and GCC require different "magical expressions" to be passed to the built-ins. These macros hide the complexity from the user. - bpf: avoid VLAs in progs/test_xdp_dynptr.c In the progs/test_xdp_dynptr.c there are a bunch of VLAs in the handle_ipv4 and handle_ipv6 functions: const size_t tcphdr_sz = sizeof(struct tcphdr); const size_t udphdr_sz = sizeof(struct udphdr); const size_t ethhdr_sz = sizeof(struct ethhdr); const size_t iphdr_sz = sizeof(struct iphdr); const size_t ipv6hdr_sz = sizeof(struct ipv6hdr); [...] static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) { __u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz]; __u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz]; __u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz]; [...] } static __always_inline int handle_ipv6(struct xdp_md *xdp, struct bpf_dynptr *xdp_ptr) { __u8 eth_buffer[ethhdr_sz + ipv6hdr_sz + ethhdr_sz]; __u8 ip6h_buffer_tcp[ipv6hdr_sz + tcphdr_sz]; __u8 ip6h_buffer_udp[ipv6hdr_sz + udphdr_sz]; [...] } In both GCC and clang we are not allowing dynamic stack allocation (we used to support it in GCC using one register as an auxiliary stack pointer, but not any longer). The above code builds with clang but not with GCC: progs/test_xdp_dynptr.c:79:14: error: BPF does not support dynamic stack allocation 79 | __u8 eth_buffer[ethhdr_sz + iphdr_sz + ethhdr_sz]; | ^~~~~~~~~~ We are guessing that clang turns these arrays from VLAs into normal statically sized arrays because ethhdr_sz and friends are constant and set to sizeof, which is always known at compile time. This patch changes the selftest to use preprocessor constants instead of variables: #define tcphdr_sz sizeof(struct tcphdr) #define udphdr_sz sizeof(struct udphdr) #define ethhdr_sz sizeof(struct ethhdr) #define iphdr_sz sizeof(struct iphdr) #define ipv6hdr_sz sizeof(struct ipv6hdr) - bpf_helpers.h: define bpf_tail_call_static when building with GCC - bpf: fix constraint in test_tcpbpf_kern.c GCC emits a warning: progs/test_tcpbpf_kern.c:60:9: error: ‘op’ is used uninitialized [-Werror=uninitialized] when the uninitialized automatic `op' is used with a "+r" constraint in: asm volatile ( "%[op] = *(u32 *)(%[skops] +96)" : [op] "+r"(op) : [skops] "r"(skops) :); The constraint shall be "=r" instead. Open Questions ============== - BPF programs including libc headers. BPF programs run on their own without an operating system or a C library. Implementing C implies providing certain definitions and headers, such as stdint.h and stdarg.h. For such targets, known as "bare metal targets", the compiler has to provide these definitions and headers in order to implement the language. GCC provides the following C headers for BPF targets: float.h gcov.h iso646.h limits.h stdalign.h stdarg.h stdatomic.h stdbool.h stdckdint.h stddef.h stdfix.h stdint.h stdnoreturn.h syslimits.h tgmath.h unwind.h varargs.h However, we have found that there is at least one BPF kernel self test that include glibc headers that, indirectly, include glibc's own definitions of stdint.h and friends. This leads to compile-time errors due to conflicting types. We think that including headers from a glibc built for some host target is very questionable. For example, in BPF a C `char' is defined to be signed. But if a BPF program includes glibc headers in an android system, that code will assume an unsigned char instead. Other Updates ============= - Brian Witte has adapted the Waldo 80211 debug/test/trace wireless analyzer tool to be built with GCC BPF. This includes CI that uses the latest GCC git version, which is quite useful for us. https://git.sr.ht/~brianwitte/waldo-80211 - Brian has also published a tested and documented very simple bpf program example, with the goal of providing an accessible and instructive example for those interested in BPF development with the GNU toolchain. https://git.sr.ht/~brianwitte/gcc-bpf-example