On Tue, Feb 15, 2022 at 07:36:44AM -0800, Brian Geffon wrote: > There are two issues with PKRU handling prior to 5.13. The first is that > when eagerly switching PKRU we check that current is not a kernel > thread as kernel threads will never use PKRU. It's possible that > this_cpu_read_stable() on current_task (ie. get_current()) is returning > an old cached value. By forcing the read with this_cpu_read() the > correct task is used. Without this it's possible when switching from > a kernel thread to a userspace thread that we'll still observe the > PF_KTHREAD flag and never restore the PKRU. And as a result this > issue only occurs when switching from a kernel thread to a userspace > thread, switching from a non kernel thread works perfectly fine because > all we consider in that situation is the flags from some other non > kernel task and the next fpu is passed in to switch_fpu_finish(). > > Without reloading the value finish_fpu_load() after being inlined into > __switch_to() uses a stale value of current: > > ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi > ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13) > bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb> > bb2: 41 f6 45 3e 20 testb $0x20,0x3e(%r13) > bb7: 75 1c jne bd5 <__switch_to+0x20e> > > By using this_cpu_read() and avoiding the cached value the compiler does > insert an additional load instruction and observes the correct value now: > > ba1: 8b 35 00 00 00 00 mov 0x0(%rip),%esi > ba7: f0 41 80 4d 01 40 lock orb $0x40,0x1(%r13) > bad: e9 00 00 00 00 jmp bb2 <__switch_to+0x1eb> > bb2: 65 48 8b 05 00 00 00 mov %gs:0x0(%rip),%rax > bb9: 00 > bba: f6 40 3e 20 testb $0x20,0x3e(%rax) > bbe: 75 1c jne bdc <__switch_to+0x215> > > The second issue is when using write_pkru() we only write to the > xstate when the feature bit is set because get_xsave_addr() returns > NULL when the feature bit is not set. This is problematic as the CPU > is free to clear the feature bit when it observes the xstate in the > init state, this behavior seems to be documented a few places throughout > the kernel. If the bit was cleared then in write_pkru() we would happily > write to PKRU without ever updating the xstate, and the FPU restore on > return to userspace would load the old value agian. > > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state") > Signed-off-by: Brian Geffon <bgeffon@xxxxxxxxxx> > Signed-off-by: Willis Kung <williskung@xxxxxxxxxx> > Tested-by: Willis Kung <williskung@xxxxxxxxxx> > --- > arch/x86/include/asm/fpu/internal.h | 2 +- > arch/x86/include/asm/pgtable.h | 14 ++++++++++---- > 2 files changed, 11 insertions(+), 5 deletions(-) <formletter> This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly. </formletter>