On Fri, Jul 19, 2019 at 1:17 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Jul 18, 2019 at 02:34:44PM -0700, Nick Desaulniers wrote: > > On Wed, Jul 17, 2019 at 5:02 PM Vaibhav Rustagi > > <vaibhavrustagi@xxxxxxxxxx> wrote: > > > > > > Compiling the purgatory code with clang results in using of mmx > > > registers. > > > > > > $ objdump -d arch/x86/purgatory/purgatory.ro | grep xmm > > > > > > 112: 0f 28 00 movaps (%rax),%xmm0 > > > 115: 0f 11 07 movups %xmm0,(%rdi) > > > 122: 0f 28 00 movaps (%rax),%xmm0 > > > 125: 0f 11 47 10 movups %xmm0,0x10(%rdi) > > > > > > Add -mno-sse, -mno-mmx, -mno-sse2 to avoid generating SSE instructions. > > > > > > Signed-off-by: Vaibhav Rustagi <vaibhavrustagi@xxxxxxxxxx> > > > --- > > > arch/x86/purgatory/Makefile | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile > > > index 3cf302b26332..3589ec4a28c7 100644 > > > --- a/arch/x86/purgatory/Makefile > > > +++ b/arch/x86/purgatory/Makefile > > > @@ -20,6 +20,7 @@ KCOV_INSTRUMENT := n > > > # sure how to relocate those. Like kexec-tools, use custom flags. > > > > > > KBUILD_CFLAGS := -fno-strict-aliasing -Wall -Wstrict-prototypes -fno-zero-initialized-in-bss -fno-builtin -ffreestanding -c -Os -mcmodel=large > > > +KBUILD_CFLAGS += -mno-mmx -mno-sse -mno-sse2 > > > > Yep, this is a commonly recurring bug in the kernel, observed again > > and again for Clang builds. The top level Makefile carefully sets > > KBUILD_CFLAGS, then lower subdirs in the kernel wipe them away with > > `:=` assignment. Invariably important flags don't always get re-added. > > In this case, these flags are used in arch/x86/Makefile, but not here > > and should be IMO. Thanks for the patch. > > Should we then not fix/remove these := assignments? Good point, it's actually pretty straightforward to do so. It just will invert the order of patches in the series, as then the memcpy/memset infinite recursion is now guaranteed with CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y (without the other patch in this series). Did the x86 maintainers have thoughts on their favorite implementation of memset/memcpy for me to use from the thread from the other patch in the series? I'll just resend with this fix and maybe we can discuss there and spin a v3 if needed. -- Thanks, ~Nick Desaulniers