On 08/18/2015, 07:12 PM, Thomas D. wrote: >>> --- a/arch/x86/kernel/entry_64.S >>> +++ b/arch/x86/kernel/entry_64.S >>> @@ -1715,19 +1715,88 @@ ENTRY(nmi) >>> * a nested NMI that updated the copy interrupt stack frame, a >>> * jump will be made to the repeat_nmi code that will handle the second >>> * NMI. >>> + * >>> + * However, espfix prevents us from directly returning to userspace >>> + * with a single IRET instruction. Similarly, IRET to user mode >>> + * can fault. We therefore handle NMIs from user space like >>> + * other IST entries. >>> */ >>> >>> /* Use %rdx as out temp variable throughout */ >>> pushq_cfi %rdx >>> CFI_REL_OFFSET rdx, 0 >>> >>> + testb $3, CS-RIP+8(%rsp) >>> + jz .Lnmi_from_kernel >>> + >>> + /* >>> + * NMI from user mode. We need to run on the thread stack, but we >>> + * can't go through the normal entry paths: NMIs are masked, and >>> + * we don't want to enable interrupts, because then we'll end >>> + * up in an awkward situation in which IRQs are on but NMIs >>> + * are off. >>> + */ >>> + >>> + SWAPGS >>> + cld >>> + movq %rsp, %rdx >>> + movq PER_CPU_VAR(kernel_stack), %rsp >> >> I think you are wasting stack space here. With kernel_stack, you should >> add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5 >> registers is pre-reserved at kernel_stack already. (Or use movq instead >> of the 5 pushq below.) >> >> Why don't you re-use the 3.16's version anyway? >> >>> + pushq 5*8(%rdx) /* pt_regs->ss */ >>> + pushq 4*8(%rdx) /* pt_regs->rsp */ >>> + pushq 3*8(%rdx) /* pt_regs->flags */ >>> + pushq 2*8(%rdx) /* pt_regs->cs */ >>> + pushq 1*8(%rdx) /* pt_regs->rip */ >>> + pushq $-1 /* pt_regs->orig_ax */ >>> + pushq %rdi /* pt_regs->di */ >>> + pushq %rsi /* pt_regs->si */ >>> + pushq (%rdx) /* pt_regs->dx */ >>> + pushq %rcx /* pt_regs->cx */ >>> + pushq %rax /* pt_regs->ax */ >>> + pushq %r8 /* pt_regs->r8 */ >>> + pushq %r9 /* pt_regs->r9 */ >>> + pushq %r10 /* pt_regs->r10 */ >>> + pushq %r11 /* pt_regs->r11 */ >>> + pushq %rbx /* pt_regs->rbx */ >>> + pushq %rbp /* pt_regs->rbp */ >>> + pushq %r12 /* pt_regs->r12 */ >>> + pushq %r13 /* pt_regs->r13 */ >>> + pushq %r14 /* pt_regs->r14 */ >>> + pushq %r15 /* pt_regs->r15 */ > > Mh, so you mean > >> + addq $KERNEL_STACK_OFFSET, %rsp > > between > >> + movq PER_CPU_VAR(kernel_stack), %rsp > > and > >> + pushq 5*8(%rdx) /* pt_regs->ss */ > > is missing? Yep, that makes sense. But I am not an x86 maintainer :P. -- js suse labs -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html