On Mon, May 17, 2021 at 11:53 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote: > On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@xxxxxxxx> > > > > As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected > > behavior when faced with overlapping unaligned pointers. The kernel's > > unaligned/access-ok.h header technically invokes undefined behavior > > that happens to usually work on the architectures using it, but if the > > compiler optimizes code based on the assumption that undefined behavior > > doesn't happen, it can create output that actually causes data corruption. > > > > A related problem was previously found on 32-bit ARMv7, where most > > instructions can be used on unaligned data, but 64-bit ldrd/strd causes > > an exception. The workaround was to always use the unaligned/le_struct.h > > helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM: > > 8715/1: add a private asm/unaligned.h"). > > > > The same solution should work on all other architectures as well, so > > remove the access-ok.h variant and use the other one unconditionally on > > all architectures, picking either the big-endian or little-endian version. > > FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to > copy between overlapping unaligned pointers > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994). Thank you for pointing this out > I'm not sure whether the kernel will run into that or not, and gcc has since > fixed it. But it's worth mentioning, especially since the issue mentioned in > this commit sounds very similar (overlapping unaligned pointers), and both > involved implementations of DEFLATE decompression. I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1 and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 Trying with both the original get_unaligned() version in there and the packed-struct variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those myself for arc hs4x , but it's rather different from the output that Vineet got and I don't know how to spot whether the problem exists in any of those versions. > Anyway, partly due to the above, in userspace I now only use memcpy() to > implement {get,put}_unaligned_*, since these days it seems to be compiled > optimally and have the least amount of problems. > > I wonder if the kernel should do the same, or whether there are still cases > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but > it was fixed in gcc 6. It would have to be memmove(), not memcpy() in this case, right? My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower code, we can live with that, unlike the possibility of gcc-10.{1,2} producing incorrect code. Since the new asm/unaligned.h has a single implementation across all architectures, we could probably fall back to a memmove based version for the compilers affected by the 94994 bug, but I'd first need to have a better way to test regarding whether given combination of asm/unaligned.h and compiler version runs into this bug. I have checked your reproducer and confirmed that it does affect x86_64 gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily mean it's safe on these. Arnd