Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Wed, 13 Mar 2024 11:23:44 +0100

On Mon, Mar 11 2024 at 16:46, Pasha Tatashin wrote:
> @@ -413,6 +413,9 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
>  	}
>  #endif
>  
> +	if (dynamic_stack_fault(current, address))
> +		return;
> +
>  	irqentry_nmi_enter(regs);
>  	instrumentation_begin();
>  	notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index d6375b3c633b..651c558b10eb 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1198,6 +1198,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
>  	if (is_f00f_bug(regs, hw_error_code, address))
>  		return;
>  
> +	if (dynamic_stack_fault(current, address))
> +		return;

T1 schedules out with stack used close to the fault boundary.

   switch_to(T2)

Now T1 schedules back in

   switch_to(T1)
     __switch_to_asm()
        ...
        switch_stacks()         <- SP on T1 stack
!        ...                    
!       jmp __switch_to()
!    __switch_to()
!       ...
!       raw_cpu_write(pcpu_hot.current_task, next_p);

After switching SP to T1's stack and up to the point where
pcpu_hot.current_task (aka current) is updated to T1 a stack fault will
invoke dynamic_stack_fault(T2, address) which will return false here:

	/* check if address is inside the kernel stack area */
	stack = (unsigned long)tsk->stack;
	if (address < stack || address >= stack + THREAD_SIZE)
		return false;

because T2's stack does obviously not cover the faulting address on T1's
stack. As a consequence double fault will panic the machine.

Thanks,

        tglx