On Thu, Jan 14, 2021 at 08:31:30AM -0800, Andy Lutomirski wrote: > This is gross. I realize this is only used for old CPUs that we don't > care about perf-wise Performance might be still important for embedded systems (Geode LX seems to be supported "until at least 2021"). > , but this code is nonsense -- it makes absolutely > to sense to put this absurd condition here to work around > kernel_fpu_begin() bugs. Yes, it's nonsense. But I think that the kernel might have a silent assumption, that the FPU cannot be used too early. > If we want to use MMX, we should check MMX. > And we should also check the correct condition re: irqs. So this code > should be: > > if (boot_cpu_has(X86_FEATURE_XMM) && irq_fpu_usable()) { > kernel_fpu_begin_mask(FPU_MASK_XMM); > ... This code does not use SSE (XMM). It uses only MMX and 3DNow!, so XMM check is not needed here. And this code is enabled only when a processor with 3DNow! is selected ("lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o" in Makefile). So: if (!irq_fpu_usable()) fallback_to_non_mmx_code(); is sufficient. But the original: if (!in_interrupt()) fallback_to_non_mmx_code(); is also valid. Changing it to irq_fpu_usable() might allow using MMX variant in more cases and even improve performance. But it can also introduce some regressions. > or we could improve code gen by adding a try_kernel_fpu_begin_mask() > so we can do a single call instead of two. > [...] > What do you all think? If you're generally in favor, I can write the > code and do a quick audit for other weird cases in the kernel. I think that the cleanest approach would be introducing fpu_usable(), which returns 0 (or false) if FPU is not initialized yet. Then irq_fpu_usable() could be changed to: bool irq_fpu_usable(void) { return fpu_usable() && (!in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle()); } and in the _mmx_memcpy(): - if (unlikely(in_interrupt())) + if (unlikely(irq_fpu_usable())) return __memcpy(to, from, len); try_kernel_fpu_begin(), even without mask, is also a good idea. If the FPU is not available yet (FPU not initizalized yet) it can fail and a fallback code would be used. However, in some cases fpu_usable() + kernel_fpu_begin() might be better, if between fpu_usable() check and kernel_fpu_begin() long preparation is required. (kernel_fpu_begin() disables preemption). In the mmx_32.c case try_kernel_fpu_begin() looks good, only two simple lines are added to a section with disabled preemption: diff --git a/arch/x86/lib/mmx_32.c b/arch/x86/lib/mmx_32.c index 4321fa02e18d..9c6dadd2b2ef 100644 --- a/arch/x86/lib/mmx_32.c +++ b/arch/x86/lib/mmx_32.c @@ -31,14 +31,12 @@ void *_mmx_memcpy(void *to, const void *from, size_t len) void *p; int i; - if (unlikely(in_interrupt())) + if (unlikely(try_kernel_fpu_begin())) return __memcpy(to, from, len); p = to; i = len >> 6; /* len/64 */ - kernel_fpu_begin(); - __asm__ __volatile__ ( "1: prefetch (%0)\n" /* This set is 28 bytes */ " prefetch 64(%0)\n" And if try_kernel_fpu_begin_mask() with a mask will allow for some optimizations it also might be a good idea. > Or we could flip the OSFSXR bit very early, I suppose. I think that my original static_branch_likely() workaround might be the simplest and touches only mmx_32.c. Such approach is already used in the kernel, for instance in the crypto code: $ git grep static_branch arch/x86/crypto/ And maybe using FPU too early should be forbidden. Best regards, Krzysiek