On Wed, Aug 12, 2015 at 06:03:26PM -0700, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > This is a note to let you know that I've just added the patch titled > > x86/nmi/64: Switch stacks on userspace NMI entry > > to the 4.1-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary > > The filename of the patch is: > x86-nmi-64-switch-stacks-on-userspace-nmi-entry.patch > and it can be found in the queue-4.1 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable@xxxxxxxxxxxxxxx> know about it. > > > >From 9b6e6a8334d56354853f9c255d1395c2ba570e0a Mon Sep 17 00:00:00 2001 > From: Andy Lutomirski <luto@xxxxxxxxxx> > Date: Wed, 15 Jul 2015 10:29:35 -0700 > Subject: x86/nmi/64: Switch stacks on userspace NMI entry > > From: Andy Lutomirski <luto@xxxxxxxxxx> > > commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream. > > Returning to userspace is tricky: IRET can fail, and ESPFIX can > rearrange the stack prior to IRET. > > The NMI nesting fixup relies on a precise stack layout and > atomic IRET. Rather than trying to teach the NMI nesting fixup > to handle ESPFIX and failed IRET, punt: run NMIs that came from > user mode on the normal kernel stack. > > This will make some nested NMIs visible to C code, but the C > code is okay with that. > > As a side effect, this should speed up perf: it eliminates an > RDMSR when NMIs come from user mode. > > Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx> > Reviewed-by: Steven Rostedt <rostedt@xxxxxxxxxxx> > Reviewed-by: Borislav Petkov <bp@xxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > > --- > arch/x86/kernel/entry_64.S | 61 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 57 insertions(+), 4 deletions(-) > > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -1424,19 +1424,72 @@ ENTRY(nmi) > * a nested NMI that updated the copy interrupt stack frame, a > * jump will be made to the repeat_nmi code that will handle the second > * NMI. > + * > + * However, espfix prevents us from directly returning to userspace > + * with a single IRET instruction. Similarly, IRET to user mode > + * can fault. We therefore handle NMIs from user space like > + * other IST entries. > */ > > /* Use %rdx as our temp variable throughout */ > pushq_cfi %rdx > CFI_REL_OFFSET rdx, 0 > > + testb $3, CS-RIP+8(%rsp) > + jz .Lnmi_from_kernel > + > + /* > + * NMI from user mode. We need to run on the thread stack, but we > + * can't go through the normal entry paths: NMIs are masked, and > + * we don't want to enable interrupts, because then we'll end > + * up in an awkward situation in which IRQs are on but NMIs > + * are off. > + */ > + > + SWAPGS > + cld > + movq %rsp, %rdx > + movq PER_CPU_VAR(kernel_stack), %rsp Note, this differs from what is in 4.2-rc, and what was in Ben's backported version for 4.0 because we don't have a KERNEL_STACK_OFFSET anymore in 4.1, and we don't yet have cpu_current_top_of_stack either. So odds are, this is wrong, but if so, what should I do here for 4.1? Backport the cpu_current_top_of_stack logic? hints greatly appreciated... thanks, greg k-h > + pushq 5*8(%rdx) /* pt_regs->ss */ > + pushq 4*8(%rdx) /* pt_regs->rsp */ > + pushq 3*8(%rdx) /* pt_regs->flags */ > + pushq 2*8(%rdx) /* pt_regs->cs */ > + pushq 1*8(%rdx) /* pt_regs->rip */ > + pushq $-1 /* pt_regs->orig_ax */ > + pushq %rdi /* pt_regs->di */ > + pushq %rsi /* pt_regs->si */ > + pushq (%rdx) /* pt_regs->dx */ > + pushq %rcx /* pt_regs->cx */ > + pushq %rax /* pt_regs->ax */ > + pushq %r8 /* pt_regs->r8 */ > + pushq %r9 /* pt_regs->r9 */ > + pushq %r10 /* pt_regs->r10 */ > + pushq %r11 /* pt_regs->r11 */ > + pushq %rbx /* pt_regs->rbx */ > + pushq %rbp /* pt_regs->rbp */ > + pushq %r12 /* pt_regs->r12 */ > + pushq %r13 /* pt_regs->r13 */ > + pushq %r14 /* pt_regs->r14 */ > + pushq %r15 /* pt_regs->r15 */ > + > + /* > + * At this point we no longer need to worry about stack damage > + * due to nesting -- we're on the normal thread stack and we're > + * done with the NMI stack. > + */ > + movq %rsp, %rdi > + movq $-1, %rsi > + call do_nmi > + > /* > - * If %cs was not the kernel segment, then the NMI triggered in user > - * space, which means it is definitely not nested. > + * Return back to user mode. We must *not* do the normal exit > + * work, because we don't want to enable interrupts. Fortunately, > + * do_nmi doesn't modify pt_regs. > */ > - cmpl $__KERNEL_CS, 16(%rsp) > - jne first_nmi > + SWAPGS > + jmp restore_c_regs_and_iret > > +.Lnmi_from_kernel: > /* > * Check the special variable on the stack to see if NMIs are > * executing. > > > Patches currently in stable-queue which might be from luto@xxxxxxxxxx are > > queue-4.1/x86-nmi-enable-nested-do_nmi-handling-for-64-bit-kernels.patch > queue-4.1/x86-nmi-64-switch-stacks-on-userspace-nmi-entry.patch > queue-4.1/x86-nmi-64-remove-asm-code-that-saves-cr2.patch > queue-4.1/x86-nmi-64-use-df-to-avoid-userspace-rsp-confusing-nested-nmi-detection.patch > queue-4.1/x86-asm-entry-64-remove-pointless-jump-to-irq_return.patch > queue-4.1/x86-nmi-64-reorder-nested-nmi-checks.patch > queue-4.1/x86-nmi-64-improve-nested-nmi-comments.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html