Re: [PATCH-v3.14.y 3/6] x86/nmi/64: Switch stacks on userspace NMI entry

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/18/2015, 07:12 PM, Thomas D. wrote:
>>> --- a/arch/x86/kernel/entry_64.S
>>> +++ b/arch/x86/kernel/entry_64.S
>>> @@ -1715,19 +1715,88 @@ ENTRY(nmi)
>>>  	 * a nested NMI that updated the copy interrupt stack frame, a
>>>  	 * jump will be made to the repeat_nmi code that will handle the second
>>>  	 * NMI.
>>> +	 *
>>> +	 * However, espfix prevents us from directly returning to userspace
>>> +	 * with a single IRET instruction.  Similarly, IRET to user mode
>>> +	 * can fault.  We therefore handle NMIs from user space like
>>> +	 * other IST entries.
>>>  	 */
>>>  
>>>  	/* Use %rdx as out temp variable throughout */
>>>  	pushq_cfi %rdx
>>>  	CFI_REL_OFFSET rdx, 0
>>>  
>>> +	testb	$3, CS-RIP+8(%rsp)
>>> +	jz	.Lnmi_from_kernel
>>> +
>>> +	/*
>>> +	 * NMI from user mode.  We need to run on the thread stack, but we
>>> +	 * can't go through the normal entry paths: NMIs are masked, and
>>> +	 * we don't want to enable interrupts, because then we'll end
>>> +	 * up in an awkward situation in which IRQs are on but NMIs
>>> +	 * are off.
>>> +	 */
>>> +
>>> +	SWAPGS
>>> +	cld
>>> +	movq	%rsp, %rdx
>>> +	movq	PER_CPU_VAR(kernel_stack), %rsp
>>
>> I think you are wasting stack space here. With kernel_stack, you should
>> add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5
>> registers is pre-reserved at kernel_stack already. (Or use movq instead
>> of the 5 pushq below.)
>>
>> Why don't you re-use the 3.16's version anyway?
>>
>>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
>>> +	pushq	4*8(%rdx)	/* pt_regs->rsp */
>>> +	pushq	3*8(%rdx)	/* pt_regs->flags */
>>> +	pushq	2*8(%rdx)	/* pt_regs->cs */
>>> +	pushq	1*8(%rdx)	/* pt_regs->rip */
>>> +	pushq   $-1		/* pt_regs->orig_ax */
>>> +	pushq   %rdi		/* pt_regs->di */
>>> +	pushq   %rsi		/* pt_regs->si */
>>> +	pushq   (%rdx)		/* pt_regs->dx */
>>> +	pushq   %rcx		/* pt_regs->cx */
>>> +	pushq   %rax		/* pt_regs->ax */
>>> +	pushq   %r8		/* pt_regs->r8 */
>>> +	pushq   %r9		/* pt_regs->r9 */
>>> +	pushq   %r10		/* pt_regs->r10 */
>>> +	pushq   %r11		/* pt_regs->r11 */
>>> +	pushq	%rbx		/* pt_regs->rbx */
>>> +	pushq	%rbp		/* pt_regs->rbp */
>>> +	pushq	%r12		/* pt_regs->r12 */
>>> +	pushq	%r13		/* pt_regs->r13 */
>>> +	pushq	%r14		/* pt_regs->r14 */
>>> +	pushq	%r15		/* pt_regs->r15 */
> 
> Mh, so you mean
> 
>> +	addq	$KERNEL_STACK_OFFSET, %rsp
> 
> between
> 
>> +	movq	PER_CPU_VAR(kernel_stack), %rsp
> 
> and
> 
>> +	pushq	5*8(%rdx)	/* pt_regs->ss */
> 
> is missing?

Yep, that makes sense. But I am not an x86 maintainer :P.

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]