On Thu, Jan 14, 2021 at 10:44:25AM +0100, Borislav Petkov wrote: > I believe the correct fix should be > > if (unlikely(in_interrupt()) || !(cr4_read_shadow() & X86_CR4_OSFXSR)) > return __memcpy(to, from, len); > > in _mmx_memcpy() as you had it in your first patch. > The OSFXSR must be set only on CPUs with SSE. There are some CPUs with 3DNow!, but without SSE and FXSR (like AMD Geode LX, which is still used in many embedded systems). So, I've changed that to: if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) && unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR)))) However, I'm not sure that adding two taken branches (that second unlikely() does not help gcc to generate better code) and a call to cr4_read_shadow() (not inlined) is a good idea. The static branch adds a single jump, which is changed to NOP. The code with boot_cpu_has() and CR4 checks adds: c09932ad: a1 50 50 c3 c0 mov 0xc0c35050,%eax c09932b2: a9 00 00 00 02 test $0x2000000,%eax /* this branch is taken */ c09932b7: 0f 85 13 01 00 00 jne c09933d0 <_mmx_memcpy+0x140> c09932bd: /* MMX code is here */ /* cr4_read_shadow is not inlined */ c09933d0: e8 8b 01 78 ff call c0113560 <cr4_read_shadow> c09933d5: f6 c4 02 test $0x2,%ah /* this branch is also taken */ c09933d8: 0f 85 df fe ff ff jne c09932bd <_mmx_memcpy+0x2d> However, except for some embedded systems, probably almost nobody uses that code today, and the most imporant is avoiding future breakage. And I'm not sure which approach is better. Because that CR4 test tests for a feature that is not used in mmx_memcpy(), but it's used in kernel_fpu_begin(). And in future kernel_fpu_begin() may change and require also other stuff. So I think that the best approach would be delay any FPU optimized memcpy() after fpu__init_system() is executed. Best regards, Krzysiek -- >8 -- Subject: [PATCH] x86/lib: don't use mmx_memcpy() to early The MMX 3DNow! optimized memcpy() is used very early, even before FPU is initialized in the kernel. It worked fine, but commit 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()") broke that. After that commit the kernel_fpu_begin() assumes that FXSR is enabled in the CR4 register on all processors with SSE. Because memcpy() is used before FXSR is enabled, the kernel crashes just after "Booting the kernel." message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when some AMD/Cyrix processors are selected) on processors with SSE (like AMD K7, which supports both MMX 3DNow! and SSE). Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Ingo Molnar <mingo@xxxxxxxxxx> Cc: Borislav Petkov <bp@xxxxxxxxx> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> Signed-off-by: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> --- arch/x86/lib/mmx_32.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c index 4321fa02e18d..70aa769570e6 100644 --- a/arch/x86/lib/mmx_32.c +++ b/arch/x86/lib/mmx_32.c @@ -25,13 +25,20 @@ #include <asm/fpu/api.h> #include <asm/asm.h> +#include <asm/tlbflush.h> void *_mmx_memcpy(void *to, const void *from, size_t len) { void *p; int i; - if (unlikely(in_interrupt())) + /* + * kernel_fpu_begin() assumes that FXSR is enabled on all processors + * with SSE. Thus, MMX-optimized version can't be used + * before the kernel enables FXSR (OSFXSR bit in the CR4 register). + */ + if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) && + unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR)))) return __memcpy(to, from, len); p = to; -- 2.27.0.rc1.207.gb85828341f