clang-10 has a broken optimization stage that doesn't enable the compiler to prove at compile time that certain memcpys are within bounds, and thus the outline memcpy is always called, resulting in horrific performance, and in some cases, excessive stack frame growth. Here's a simple reproducer: typedef unsigned long size_t; void *c(void *dest, const void *src, size_t n) __asm__("memcpy"); extern inline __attribute__((gnu_inline)) void *memcpy(void *dest, const void *src, size_t n) { return c(dest, src, n); } void blah(char *a) { unsigned long long b[10], c[10]; int i; memcpy(b, a, sizeof(b)); for (i = 0; i < 10; ++i) c[i] = b[i] ^ b[9 - i]; for (i = 0; i < 10; ++i) b[i] = c[i] ^ a[i]; memcpy(a, b, sizeof(b)); } Compile this with clang-9 and clang-10 and observe: zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-10 -Wframe-larger-than=0 -O3 -c b.c -o c10.o b.c:5:6: warning: stack frame size of 104 bytes in function 'blah' [-Wframe-larger-than=] void blah(char *a) ^ 1 warning generated. zx2c4@thinkpad /tmp/curve25519-hacl64-stack-frame-size-test $ clang-9 -Wframe-larger-than=0 -O3 -c b.c -o c9.o Looking at the disassembly of c10.o and c9.o, one can see that c9.o is properly optimized in the obvious way one would expect, while c10.o has blown up and includes extern calls to memcpy. This is present on the most trivial bits of code. Thus, for clang-10, we just set __NO_FORTIFY globally, so that this issue won't be incurred. Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: LKML <linux-kernel@xxxxxxxxxxxxxxx> Cc: clang-built-linux <clang-built-linux@xxxxxxxxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Cc: George Burgess <gbiv@xxxxxxxxxx> Cc: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Link: https://bugs.llvm.org/show_bug.cgi?id=45802 Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> --- Makefile | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Makefile b/Makefile index 49b2709ff44e..f022f077591d 100644 --- a/Makefile +++ b/Makefile @@ -768,6 +768,13 @@ KBUILD_CFLAGS += -Wno-gnu # source of a reference will be _MergedGlobals and not on of the whitelisted names. # See modpost pattern 2 KBUILD_CFLAGS += -mno-global-merge + +# clang-10 has a broken optimization stage that causes memcpy to always be +# outline, resulting in excessive stack frame growth and poor performance. +ifeq ($(shell test $(CONFIG_CLANG_VERSION) -ge 100000 && test $(CONFIG_CLANG_VERSION) -lt 110000; echo $$?),0) +KBUILD_CFLAGS += -D__NO_FORTIFY +endif + else # These warnings generated too much noise in a regular build. -- 2.26.2