Re: [PATCH stable 5.4,5.10] x86/fpu: Correct pkru/xstate inconsistency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 15, 2022 at 2:45 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Feb 15, 2022 at 11:22:33AM -0800, Brian Geffon wrote:
> > When eagerly switching PKRU in switch_fpu_finish() it checks that
> > current is not a kernel thread as kernel threads will never use PKRU.
> > It's possible that this_cpu_read_stable() on current_task
> > (ie. get_current()) is returning an old cached value. To resolve this
> > reference next_p directly rather than relying on current.
> >
> > As written it's possible when switching from a kernel thread to a
> > userspace thread to observe a cached PF_KTHREAD flag and never restore
> > the PKRU. And as a result this issue only occurs when switching
> > from a kernel thread to a userspace thread, switching from a non kernel
> > thread works perfectly fine because all that is considered in that
> > situation are the flags from some other non kernel task and the next fpu
> > is passed in to switch_fpu_finish().
> >
> > This behavior only exists between 5.2 and 5.13 when it was fixed by a
> > rewrite decoupling PKRU from xstate, in:
> >   commit 954436989cc5 ("x86/fpu: Remove PKRU handling from switch_fpu_finish()")
> >
> > Unfortunately backporting the fix from 5.13 is probably not realistic as
> > it's part of a 60+ patch series which rewrites most of the PKRU handling.
> >
> > Fixes: 0cecca9d03c9 ("x86/fpu: Eager switch PKRU state")
> > Signed-off-by: Brian Geffon <bgeffon@xxxxxxxxxx>
> > Signed-off-by: Willis Kung <williskung@xxxxxxxxxx>
> > Tested-by: Willis Kung <williskung@xxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.4.x
> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.10.x
> > ---
> >  arch/x86/include/asm/fpu/internal.h | 13 ++++++++-----
> >  arch/x86/kernel/process_32.c        |  6 ++----
> >  arch/x86/kernel/process_64.c        |  6 ++----
> >  3 files changed, 12 insertions(+), 13 deletions(-)
>
> So this is ONLY for 5.4.y and 5.10.y?  I'm really really loath to take
> non-upstream changes as 95% of the time (really) it goes wrong.

That's correct, this bug was introduced in 5.2 and that code was
completely refactored in 5.13 indirectly fixing it.

>
> How was this tested, and what do the maintainers of this subsystem
> think?  And will you be around to fix the bugs in this when they are
> found?

This has been trivial to reproduce, I've used a small repro which I've
put here: https://gist.github.com/bgaff/9f8cbfc8dd22e60f9492e4f0aff8f04f
, I also was able to reproduce this using the protection_keys self
tests on a 11th Gen Core i5-1135G7. I'm happy to commit to addressing
any bugs that may appear. I'll see what the maintainers say, but there
is also a smaller fix that just involves using this_cpu_read() in
switch_fpu_finish() for this specific issue, although that approach
isn't as clean.

>
> And finally, what's wrong with 60+ patches to backport to fix a severe
> issue?  What's preventing that from happening?  Did you try it and see
> what exactly is involved?

It was quite a substantial rewrite of that code with fixes layered on since.

>
> thanks,
>
> greg k-h



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux