Re: Patch "x86/nmi/64: Switch stacks on userspace NMI entry" has been added to the 4.1-stable tree

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 13 Aug 2015 11:51:12 -0700

On Wed, Aug 12, 2015 at 6:08 PM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Aug 12, 2015 at 06:03:26PM -0700, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
>>
>> This is a note to let you know that I've just added the patch titled
>>
>>     x86/nmi/64: Switch stacks on userspace NMI entry
>>
>> to the 4.1-stable tree which can be found at:
>>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
>>
>> The filename of the patch is:
>>      x86-nmi-64-switch-stacks-on-userspace-nmi-entry.patch
>> and it can be found in the queue-4.1 subdirectory.
>>
>> If you, or anyone else, feels it should not be added to the stable tree,
>> please let <stable@xxxxxxxxxxxxxxx> know about it.
>>
>>
>> >From 9b6e6a8334d56354853f9c255d1395c2ba570e0a Mon Sep 17 00:00:00 2001
>> From: Andy Lutomirski <luto@xxxxxxxxxx>
>> Date: Wed, 15 Jul 2015 10:29:35 -0700
>> Subject: x86/nmi/64: Switch stacks on userspace NMI entry
>>
>> From: Andy Lutomirski <luto@xxxxxxxxxx>
>>
>> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.
>>
>> Returning to userspace is tricky: IRET can fail, and ESPFIX can
>> rearrange the stack prior to IRET.
>>
>> The NMI nesting fixup relies on a precise stack layout and
>> atomic IRET.  Rather than trying to teach the NMI nesting fixup
>> to handle ESPFIX and failed IRET, punt: run NMIs that came from
>> user mode on the normal kernel stack.
>>
>> This will make some nested NMIs visible to C code, but the C
>> code is okay with that.
>>
>> As a side effect, this should speed up perf: it eliminates an
>> RDMSR when NMIs come from user mode.
>>
>> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
>> Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
>> Reviewed-by: Borislav Petkov <bp@xxxxxxx>
>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>>
>> ---
>>  arch/x86/kernel/entry_64.S |   61 ++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 57 insertions(+), 4 deletions(-)
>>
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -1424,19 +1424,72 @@ ENTRY(nmi)
>>        * a nested NMI that updated the copy interrupt stack frame, a
>>        * jump will be made to the repeat_nmi code that will handle the second
>>        * NMI.
>> +      *
>> +      * However, espfix prevents us from directly returning to userspace
>> +      * with a single IRET instruction.  Similarly, IRET to user mode
>> +      * can fault.  We therefore handle NMIs from user space like
>> +      * other IST entries.
>>        */
>>
>>       /* Use %rdx as our temp variable throughout */
>>       pushq_cfi %rdx
>>       CFI_REL_OFFSET rdx, 0
>>
>> +     testb   $3, CS-RIP+8(%rsp)
>> +     jz      .Lnmi_from_kernel
>> +
>> +     /*
>> +      * NMI from user mode.  We need to run on the thread stack, but we
>> +      * can't go through the normal entry paths: NMIs are masked, and
>> +      * we don't want to enable interrupts, because then we'll end
>> +      * up in an awkward situation in which IRQs are on but NMIs
>> +      * are off.
>> +      */
>> +
>> +     SWAPGS
>> +     cld
>> +     movq    %rsp, %rdx
>> +     movq    PER_CPU_VAR(kernel_stack), %rsp
>
> Note, this differs from what is in 4.2-rc, and what was in Ben's
> backported version for 4.0 because we don't have a KERNEL_STACK_OFFSET
> anymore in 4.1, and we don't yet have cpu_current_top_of_stack either.
>
> So odds are, this is wrong, but if so, what should I do here for 4.1?
> Backport the cpu_current_top_of_stack logic?

I haven't tested directly, but this looks correct.  In 4.1,
KERNEL_STACK_OFFSET was removed and effectively became zero.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html