On Thu, Jan 14, 2021 at 6:51 AM Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> wrote: > > On Thu, Jan 14, 2021 at 03:07:37PM +0100, Borislav Petkov wrote: > > On Thu, Jan 14, 2021 at 01:36:57PM +0100, Krzysztof Mazur wrote: > > > The OSFXSR must be set only on CPUs with SSE. There > > > are some CPUs with 3DNow!, but without SSE and FXSR (like AMD > > > Geode LX, which is still used in many embedded systems). > > > So, I've changed that to: > > > > > > if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) && > > > unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR)))) > > > > Why? > > > > X86_CR4_OSFXSR won't ever be set on those CPUs but the test will be > > performed anyway. So there's no need for boot_cpu_has(). > > Because the MMX version should be always used on those CPUs, even without > OSFXSR set. If the CPU does not support SSE, it is safe to > call kernel_fpu_begin() without OSFXSR set. > "!(cr4_read_shadow() & X86_CR4_OSFXSR)" will be always true on > those CPUs, and without boot_cpu_has() MMX version will be never used. > > There are two cases: > > 3DNow! without SSE always use MMX version > 3DNow! + SSE (K7) use MMX version only if FXSR is enabled > > Thanks. > > Best regards, > Krzysiek > -- >8 -- > Subject: [PATCH] x86/lib: don't use mmx_memcpy() too early > > The MMX 3DNow! optimized memcpy() is used very early, > even before FPU is initialized in the kernel. It worked fine, but commit > 7ad816762f9bf89e940e618ea40c43138b479e10 ("x86/fpu: Reset MXCSR > to default in kernel_fpu_begin()") broke that. After that > commit the kernel_fpu_begin() assumes that FXSR is enabled in > the CR4 register on all processors with SSE. Because memcpy() is used > before FXSR is enabled, the kernel crashes just after "Booting the kernel." > message. It affects all kernels with CONFIG_X86_USE_3DNOW (enabled when > some AMD/Cyrix processors are selected) on processors with SSE > (like AMD K7, which supports both MMX 3DNow! and SSE). > > Fixes: 7ad816762f9b ("x86/fpu: Reset MXCSR to default in kernel_fpu_begin()") > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: Borislav Petkov <bp@xxxxxxxxx> > Cc: "H. Peter Anvin" <hpa@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # 5.8+ > Signed-off-by: Krzysztof Mazur <krzysiek@xxxxxxxxxxxx> > --- > arch/x86/lib/mmx_32.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c > index 4321fa02e18d..70aa769570e6 100644 > --- a/arch/x86/lib/mmx_32.c > +++ b/arch/x86/lib/mmx_32.c > @@ -25,13 +25,20 @@ > > #include <asm/fpu/api.h> > #include <asm/asm.h> > +#include <asm/tlbflush.h> > > void *_mmx_memcpy(void *to, const void *from, size_t len) > { > void *p; > int i; > > - if (unlikely(in_interrupt())) > + /* > + * kernel_fpu_begin() assumes that FXSR is enabled on all processors > + * with SSE. Thus, MMX-optimized version can't be used > + * before the kernel enables FXSR (OSFXSR bit in the CR4 register). > + */ > + if (unlikely(in_interrupt()) || (boot_cpu_has(X86_FEATURE_XMM) && > + unlikely(!(cr4_read_shadow() & X86_CR4_OSFXSR)))) This is gross. I realize this is only used for old CPUs that we don't care about perf-wise, but this code is nonsense -- it makes absolutely to sense to put this absurd condition here to work around kernel_fpu_begin() bugs. If we want to use MMX, we should check MMX. And we should also check the correct condition re: irqs. So this code should be: if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) { kernel_fpu_begin_mask(FPU_MASK_XMM); ... or we could improve code gen by adding a try_kernel_fpu_begin_mask() so we can do a single call instead of two. This also mostly fixes our little performance regression -- we'd make kernel_fpu_begin() be an inline wrapper around kernel_fpu_begin_mask(FPU_MASK_DEFAULTFP), and *that* would be FPU_MASK_XYZMM on 64-bit and FPU_MASK_387 on 32-bit. (Okay, XYZMM isn't a great name, but whatever.) And the EFI code can become completely explicit: kernel_fpu_begin(FPU_MASK_ALL). What do you all think? If you're generally in favor, I can write the code and do a quick audit for other weird cases in the kernel. Or we could flip the OSFSXR bit very early, I suppose. --Andy