Re: [PATCH bpf-next] selftests/bpf: Fix pyperf180 compilation failure with llvm18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/9/23 3:47 AM, Eduard Zingerman wrote:
On Wed, 2023-11-08 at 21:30 -0800, Yonghong Song wrote:
With latest llvm18 (main branch of llvm-project repo), when building bpf selftests,
     [~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j

The following compilation error happens:
     fatal error: error in backend: Branch target out of insn range
     ...
     Stack dump:
     0.      Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
       -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
       -I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
       -I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
       /home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
       -idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
       -c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
     1.      <eof> parser at end of file
     2.      Code generation
     ...

The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.

The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in llvm18.

To workaround the issue, previously all 180 loop iterations are fully unrolled.
Now, the fully unrolling count is changed to 90 for llvm18 and later. This reduced
some otherwise long branch target distance, and fixed the compilation failure.

   [1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e

Signed-off-by: Yonghong Song <yonghong.song@xxxxxxxxx>
Can confirm, the issue is present on clang main w/o this patch and
disappears after this patch.

Yonghong, is there a way to keep original UNROLL_COUNT if cpuv4 is used?

I thought about this but a little bit lazy so not giving it enough throught.
But since you mentioned this, I think adding a macro to indicate cpu version
by llvm is a good idea. This will give bpf developers some flexibility to
add new features (new cpu variant) or workaround bugs (for a particular cpu variant
but not impacting others if they are fine), etc.

So here is the llvm patch: https://github.com/llvm/llvm-project/pull/71856

With the above llvm patch, the following code change should work:

diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..2473845d1ee2 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
 // Copyright (c) 2019 Facebook
 #define STACK_MAX_LEN 180
+
+/* llvm upstream commit at llvm18
+ *   https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a566eebb16e
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __bpf_cpu_version__ is implemented in llvm18 to specify
+ * which cpu version is used for compilation. So we can set a smaller
+ * unroll_count if __bpf_cpu_version__ is less than 4, which reduced
+ * some branch target distances and resolved the compilation failure.
+ */
+#if defined(__bpf_cpu_version__) && __bpf_cpu_version__ < 4
+#define UNROLL_COUNT 90
+#endif
+
 #include "pyperf.h"



Tested-by: Eduard Zingerman <eddyz87@xxxxxxxxx>




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux