From: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> [ Upstream commit 158807de5822d1079e162a3762956fd743dd483e ] Clang fails to compile __get_user_size() on 32-bit for the following code: long long val; __get_user(val, usrptr); with: error: invalid output size for constraint '=q' GCC compiles the same code without complaints. The reason is that GCC and Clang are architecturally different, which leads to subtle issues for code that's invalid but clearly dead, i.e. with code that emulates polymorphism with the preprocessor and sizeof. GCC will perform semantic analysis after early inlining and dead code elimination, so it will not warn on invalid code that's dead. Clang strictly performs optimizations after semantic analysis, so it will warn for dead code. Neither Clang nor GCC like this very much with -m32: long long ret; asm ("movb $5, %0" : "=q" (ret)); However, GCC can tolerate this variant: long long ret; switch (sizeof(ret)) { case 1: asm ("movb $5, %0" : "=q" (ret)); break; case 8:; } Clang, on the other hand, won't accept that because it validates the inline asm for the '1' case before the optimisation phase where it realises that it wouldn't have to emit it anyway. If LLVM (Clang's "back end") fails such as during instruction selection or register allocation, it cannot provide accurate diagnostics (warnings / errors) that contain line information, as the AST has been discarded from memory at that point. While there have been early discussions about having C/C++ specific language optimizations in Clang via the use of MLIR, which would enable such earlier optimizations, such work is not scoped and likely a multi-year endeavor. It was discussed to change the asm output constraint for the one byte case from "=q" to "=r". While it works for 64-bit, it fails on 32-bit. With '=r' the compiler could fail to chose a register accessible as high/low which is required for the byte operation. If that happens the assembly will fail. Use a local temporary variable of type 'unsigned char' as output for the byte copy inline asm and then assign it to the real output variable. This prevents Clang from failing the semantic analysis in the above case. The resulting code for the actual one byte copy is not affected as the temporary variable is optimized out. [ tglx: Amended changelog ] Reported-by: Arnd Bergmann <arnd@xxxxxxxx> Reported-by: David Woodhouse <dwmw2@xxxxxxxxxxxxx> Reported-by: Dmitry Golovin <dima@xxxxxxxxxx> Reported-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Tested-by: Sedat Dilek <sedat.dilek@xxxxxxxxx> Acked-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Acked-by: Dennis Zhou <dennis@xxxxxxxxxx> Link: https://bugs.llvm.org/show_bug.cgi?id=33587 Link: https://github.com/ClangBuiltLinux/linux/issues/3 Link: https://github.com/ClangBuiltLinux/linux/issues/194 Link: https://github.com/ClangBuiltLinux/linux/issues/781 Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dwmw2@xxxxxxxxxxxxx/ Link: https://lore.kernel.org/lkml/CAK8P3a1EBaWdbAEzirFDSgHVJMtWjuNt2HGG8z+vpXeNHwETFQ@xxxxxxxxxxxxxx/ Link: https://lkml.kernel.org/r/20200720204925.3654302-12-ndesaulniers@xxxxxxxxxx Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> --- arch/x86/include/asm/uaccess.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index d8f283b9a569c..d1323c73cf6d2 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -314,11 +314,14 @@ do { \ #define __get_user_size(x, ptr, size, retval) \ do { \ + unsigned char x_u8__; \ + \ retval = 0; \ __chk_user_ptr(ptr); \ switch (size) { \ case 1: \ - __get_user_asm(x, ptr, retval, "b", "=q"); \ + __get_user_asm(x_u8__, ptr, retval, "b", "=q"); \ + (x) = x_u8__; \ break; \ case 2: \ __get_user_asm(x, ptr, retval, "w", "=r"); \ -- 2.25.1