On Sun, Sep 29, 2019 at 7:42 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > I had previously put quite some effort into the simd_get, simd_put, > simd_relax mechanism, so that the simd state could be persisted during > both several calls to the same function and within long loops like > below, with simd_relax existing to reenable preemption briefly if > things were getting out of hand. Ard got rid of this and has moved the > kernel_fpu_begin and kernel_fpu_end calls into the inner loop: Actually, that should be ok these days. What has happened fairly recently (it got merged into 5.2 back in May) is that we no longer do the FPU save/restore on each kernel_fpu_begin/end. Instead, we save it on kernel_fpu_begin(), and set a flag that it needs to be restored when returning to user space. So the kernel now on its own merges that FPU save/restore overhead when you do it repeatedly. The core change happened in 5f409e20b794 ("x86/fpu: Defer FPU state load until return to userspace") but there are a few commits around it for cleanups etc. The code was merged in 8ff468c29e9a ("Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") if you want to see the whole series. That said, it would be _lovely_ if you or somebody else actually double-checked that it works as expected and that the numbers bear out the improvements. It should be superior to the old model of manually trying to merge FPU use regions, both from a performance angle (because it will merge much more), but also from a code simplicity angle and the whole preemption latency worry also basically goes away. Linus