Re: Patch "x86/nmi/64: Switch stacks on userspace NMI entry" has been added to the 4.1-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 12, 2015 at 06:03:26PM -0700, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> 
> This is a note to let you know that I've just added the patch titled
> 
>     x86/nmi/64: Switch stacks on userspace NMI entry
> 
> to the 4.1-stable tree which can be found at:
>     http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
> 
> The filename of the patch is:
>      x86-nmi-64-switch-stacks-on-userspace-nmi-entry.patch
> and it can be found in the queue-4.1 subdirectory.
> 
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable@xxxxxxxxxxxxxxx> know about it.
> 
> 
> >From 9b6e6a8334d56354853f9c255d1395c2ba570e0a Mon Sep 17 00:00:00 2001
> From: Andy Lutomirski <luto@xxxxxxxxxx>
> Date: Wed, 15 Jul 2015 10:29:35 -0700
> Subject: x86/nmi/64: Switch stacks on userspace NMI entry
> 
> From: Andy Lutomirski <luto@xxxxxxxxxx>
> 
> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.
> 
> Returning to userspace is tricky: IRET can fail, and ESPFIX can
> rearrange the stack prior to IRET.
> 
> The NMI nesting fixup relies on a precise stack layout and
> atomic IRET.  Rather than trying to teach the NMI nesting fixup
> to handle ESPFIX and failed IRET, punt: run NMIs that came from
> user mode on the normal kernel stack.
> 
> This will make some nested NMIs visible to C code, but the C
> code is okay with that.
> 
> As a side effect, this should speed up perf: it eliminates an
> RDMSR when NMIs come from user mode.
> 
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Reviewed-by: Borislav Petkov <bp@xxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> ---
>  arch/x86/kernel/entry_64.S |   61 ++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 57 insertions(+), 4 deletions(-)
> 
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1424,19 +1424,72 @@ ENTRY(nmi)
>  	 * a nested NMI that updated the copy interrupt stack frame, a
>  	 * jump will be made to the repeat_nmi code that will handle the second
>  	 * NMI.
> +	 *
> +	 * However, espfix prevents us from directly returning to userspace
> +	 * with a single IRET instruction.  Similarly, IRET to user mode
> +	 * can fault.  We therefore handle NMIs from user space like
> +	 * other IST entries.
>  	 */
>  
>  	/* Use %rdx as our temp variable throughout */
>  	pushq_cfi %rdx
>  	CFI_REL_OFFSET rdx, 0
>  
> +	testb	$3, CS-RIP+8(%rsp)
> +	jz	.Lnmi_from_kernel
> +
> +	/*
> +	 * NMI from user mode.  We need to run on the thread stack, but we
> +	 * can't go through the normal entry paths: NMIs are masked, and
> +	 * we don't want to enable interrupts, because then we'll end
> +	 * up in an awkward situation in which IRQs are on but NMIs
> +	 * are off.
> +	 */
> +
> +	SWAPGS
> +	cld
> +	movq	%rsp, %rdx
> +	movq	PER_CPU_VAR(kernel_stack), %rsp

Note, this differs from what is in 4.2-rc, and what was in Ben's
backported version for 4.0 because we don't have a KERNEL_STACK_OFFSET
anymore in 4.1, and we don't yet have cpu_current_top_of_stack either.

So odds are, this is wrong, but if so, what should I do here for 4.1?
Backport the cpu_current_top_of_stack logic?

hints greatly appreciated...

thanks,

greg k-h



> +	pushq	5*8(%rdx)	/* pt_regs->ss */
> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
> +	pushq	3*8(%rdx)	/* pt_regs->flags */
> +	pushq	2*8(%rdx)	/* pt_regs->cs */
> +	pushq	1*8(%rdx)	/* pt_regs->rip */
> +	pushq   $-1		/* pt_regs->orig_ax */
> +	pushq   %rdi		/* pt_regs->di */
> +	pushq   %rsi		/* pt_regs->si */
> +	pushq   (%rdx)		/* pt_regs->dx */
> +	pushq   %rcx		/* pt_regs->cx */
> +	pushq   %rax		/* pt_regs->ax */
> +	pushq   %r8		/* pt_regs->r8 */
> +	pushq   %r9		/* pt_regs->r9 */
> +	pushq   %r10		/* pt_regs->r10 */
> +	pushq   %r11		/* pt_regs->r11 */
> +	pushq	%rbx		/* pt_regs->rbx */
> +	pushq	%rbp		/* pt_regs->rbp */
> +	pushq	%r12		/* pt_regs->r12 */
> +	pushq	%r13		/* pt_regs->r13 */
> +	pushq	%r14		/* pt_regs->r14 */
> +	pushq	%r15		/* pt_regs->r15 */
> +
> +	/*
> +	 * At this point we no longer need to worry about stack damage
> +	 * due to nesting -- we're on the normal thread stack and we're
> +	 * done with the NMI stack.
> +	 */
> +	movq	%rsp, %rdi
> +	movq	$-1, %rsi
> +	call	do_nmi
> +
>  	/*
> -	 * If %cs was not the kernel segment, then the NMI triggered in user
> -	 * space, which means it is definitely not nested.
> +	 * Return back to user mode.  We must *not* do the normal exit
> +	 * work, because we don't want to enable interrupts.  Fortunately,
> +	 * do_nmi doesn't modify pt_regs.
>  	 */
> -	cmpl $__KERNEL_CS, 16(%rsp)
> -	jne first_nmi
> +	SWAPGS
> +	jmp	restore_c_regs_and_iret
>  
> +.Lnmi_from_kernel:
>  	/*
>  	 * Check the special variable on the stack to see if NMIs are
>  	 * executing.
> 
> 
> Patches currently in stable-queue which might be from luto@xxxxxxxxxx are
> 
> queue-4.1/x86-nmi-enable-nested-do_nmi-handling-for-64-bit-kernels.patch
> queue-4.1/x86-nmi-64-switch-stacks-on-userspace-nmi-entry.patch
> queue-4.1/x86-nmi-64-remove-asm-code-that-saves-cr2.patch
> queue-4.1/x86-nmi-64-use-df-to-avoid-userspace-rsp-confusing-nested-nmi-detection.patch
> queue-4.1/x86-asm-entry-64-remove-pointless-jump-to-irq_return.patch
> queue-4.1/x86-nmi-64-reorder-nested-nmi-checks.patch
> queue-4.1/x86-nmi-64-improve-nested-nmi-comments.patch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]