On Tue, May 18, 2021 at 4:56 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote: > > > > > > I wonder if the kernel should do the same, or whether there are still cases > > > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but > > > it was fixed in gcc 6. > > > > It would have to be memmove(), not memcpy() in this case, right? > > No, it would simply be something like > > #define __get_unaligned_t(type, ptr) \ > ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; }) > > #define get_unaligned(ptr) \ > __get_unaligned_t(typeof(*(ptr)), ptr) > > but honestly, the likelihood that the compiler generates something > horrible (possibly because of KASAN etc) is uncomfortably high. > > I'd prefer the __packed thing. We don't actually use -O3, and it's > considered a bad idea, and the gcc bug is as such less likely than > just the above generating unacceptable code (we have several cases > where "bad code generation" ends up being an actual bug, since we > depend on inlining and depend on some code sequences not generating > calls etc). I think the important question is whether we know that the bug that Eric pointed to can only happen with -O3, or whether it is something in gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally well get triggered on some other architecture without -O3 or vector instructions enabled. >From the gcc fix at https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b it looks like this code path is entered when compiling with -ftree-loop-vectorize, which is documented as '-ftree-loop-vectorize' Perform loop vectorization on trees. This flag is enabled by default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and '-fauto-profile'. -ftree-vectorize is set in arch/arm/lib/xor-neon.c -O3 is set for the lz4 and zstd compression helpers and for wireguard. To be on the safe side, we could pass -fno-tree-loop-vectorize along with -O3 on the affected gcc versions, or use a bigger hammer (not use -O3 at all, always set -fno-tree-loop-vectorize, ...). Arnd