From: Sebastian Andrzej Siewior > Sent: 09 January 2019 11:47 > > This is a refurbished series originally started by by Rik van Riel. The > goal is load the FPU registers on return to userland and not on every > context switch. By this optimisation we can: > - avoid loading the registers if the task stays in kernel and does > not return to userland > - make kernel_fpu_begin() cheaper: it only saves the registers on the > first invocation. The second invocation does not need save them again. > > To access the FPU registers in kernel we need: > - disable preemption to avoid that the scheduler switches tasks. By > doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be > not valid. > - disable BH because the softirq might use kernel_fpu_begin() and then > set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion. Once this is done it might be worth while adding a parameter to kernel_fpu_begin() to request the registers only when they don't need saving. This would benefit code paths where the gains are reasonable but not massive. The return value from kernel_fpu_begin() ought to indicate which registers are available - none, SSE, SSE2, AVX, AVX512 etc. So code can use an appropriate implementation. (I've not looked to see if this is already the case!) David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)