This is a refurbished series originally started by by Rik van Riel. The goal is load the FPU registers on return to userland and not on every context switch. By this optimisation we can: - avoid loading the registers if the task stays in kernel and does not return to userland - make kernel_fpu_begin() cheaper: it only saves the registers on the first invocation. The second invocation does not need save them again. To access the FPU registers in kernel we need: - disable preemption to avoid that the scheduler switches tasks. By doing so it would set TIF_LOAD_FPU and the FPU registers would be not valid. - disable BH because the softirq might use kernel_fpu_begin() and then set TIF_LOAD_FPU instead loading the FPU registers on completion. This seems to work with userland & xmm registers. Haven't tested the pkeys feature and KVM yet. Sebastian