On 2018/11/30 5:22, Kees Cook wrote: > On Tue, Nov 27, 2018 at 8:38 PM Wang, Dongsheng > <dongsheng.wang@xxxxxxxxxxxxxxxx> wrote: >> Hello Kees, >> >> On 2018/11/28 6:38, Kees Cook wrote: >>> On Thu, Nov 22, 2018 at 11:54 PM, Wang Dongsheng >>> <dongsheng.wang@xxxxxxxxxxxxxxxx> wrote: >>>> When select ARCH_TASK_STRUCT_ON_STACK the first of thread_info variable >>>> is overwritten by STACK_END_MAGIC. In fact, the ARCH_TASK_STRUCT_ON_STACK >>>> is not a real task on stack, it's only init_task on init_stack. >>>> >>>> Commit 0500871f21b2 ("Construct init thread stack in the linker script >>>> rather than by union") added this macro and put task_strcut into >>>> thread_union. This brings us the following possibilities: >>>> TASK_ON_STACK THREAD_INFO_IN_TASK STACK >>>> ----- <-- thread_info & stack >>>> N N | | --- <-- task >>>> | | | | >>>> ----- --- >>>> >>>> ----- <-- stack >>>> N Y | | --- <-- task(Including thread_info) >>>> | | | | >>>> ----- --- >>>> >>>> ----- <-- stack & task & thread_info >>>> Y N | | >>>> | | >>>> ----- >>>> >>>> ----- <-- stack & task(Including thread_info) >>>> Y Y | | >>>> | | >>>> ----- >>>> The kernel has handled the first two cases correctly. >>>> >>>> For the third case: >>>> TASK_ON_STACK: Y. THREAD_INFO_IN_TASK: N. this case >>>> should never happen, because the task and thread_info will overlap. So >>>> when TASK_ON_STACK is selected, THREAD_INFO_IN_TASK must be selected too. >>>> >>>> For the fourth case: >>>> When task on stack, the end of stack should add a sizeof(task_struct) offset. >>>> >>>> This patch handled with the third and fourth case. >>>> >>>> Fixes: 0500871f21b2 ("Construct init thread stack in the linker ...") >>>> >>>> Signed-off-by: Wang Dongsheng <dongsheng.wang@xxxxxxxxxxxxxxxx> >>>> Signed-off-by: Shunyong Yang <shunyong.yang@xxxxxxxxxxxxxxxx> >>>> --- >>>> arch/Kconfig | 1 + >>>> include/linux/sched/task_stack.h | 5 ++++- >>>> 2 files changed, 5 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/arch/Kconfig b/arch/Kconfig >>>> index e1e540ffa979..0a2c73e73195 100644 >>>> --- a/arch/Kconfig >>>> +++ b/arch/Kconfig >>>> @@ -251,6 +251,7 @@ config ARCH_HAS_SET_MEMORY >>>> # Select if arch init_task must go in the __init_task_data section >>>> config ARCH_TASK_STRUCT_ON_STACK >>>> bool >>>> + depends on THREAD_INFO_IN_TASK || IA64 >>> The "IA64" part shouldn't be needed since IA64 already selects it. >>> >>> Since it's selected, it also can't have a depends, IIUC. >> Since the IA64 thread_info including task_struct, it doesn't need to >> select THREAD_INFO_IN_TASK. >> So we need to allow IA64 select ARCH_TASK_STRUCT_ON_STACK without >> THREAD_INFO. > Okay. > >>>> # Select if arch has its private alloc_task_struct() function >>>> config ARCH_TASK_STRUCT_ALLOCATOR >>>> diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h >>>> index 6a841929073f..624c48defb9e 100644 >>>> --- a/include/linux/sched/task_stack.h >>>> +++ b/include/linux/sched/task_stack.h >>>> @@ -7,6 +7,7 @@ >>>> */ >>>> >>>> #include <linux/sched.h> >>>> +#include <linux/sched/task.h> >>>> #include <linux/magic.h> >>>> >>>> #ifdef CONFIG_THREAD_INFO_IN_TASK >>>> @@ -25,7 +26,9 @@ static inline void *task_stack_page(const struct task_struct *task) >>>> >>>> static inline unsigned long *end_of_stack(const struct task_struct *task) >>>> { >>>> - return task->stack; >>>> + if (!IS_ENABLED(CONFIG_ARCH_TASK_STRUCT_ON_STACK) || task != &init_task) >>>> + return task->stack; >>>> + return (unsigned long *)(task + 1); >>>> } >>> This seems like a strange place for the change. It feels more like >>> init_task has been defined incorrectly. >> The init_task will put into init_stack when ARCH_TASK_STRUCT_ON_STACK is >> selected. >> include/asm-generic/vmlinux.lds.h: >> #define INIT_TASK_DATA(align) \ >> . = ALIGN(align); \ >> __start_init_task = .; \ >> init_thread_union = .; \ >> init_stack = .; \ >> KEEP(*(.data..init_task)) \ >> KEEP(*(.data..init_thread_info)) \ >> . = __start_init_task + THREAD_SIZE; \ >> __end_init_task = .; >> >> So we need end_of_stack to offset sizeof(task_struct). > Well, I guess I mean I'd rather the end_of_stack() code not be > special-cased in the if. The default should be how it was. Perhaps: > > if (IS_ENABLED(CONFIG_ARCH_TASK_STRUCT_ON_STACK) && task == &init_task) > return (unsigned long *)(task + 1); > return task->stack; > > But it still feels strange: why can't task->stack point to the correct > place in this special case? Normally, task->stack is the bottom of the stack, the end_of_stack just need to return task->stack. The task->stack always represents the bottom of the stack, and it cannot be changed, so what happens here is we have some data(task or thread info)that we want to put at the bottom of the stack. The end_of_stack just refers to the size of the stack available to us. In ARCH_TASK_STRUCT_ON_STACK case, the init_task has been placed in a fixed location, the task is placed at the bottom of the stack. Current end_of_stack doesn't handle this situation, so we need add a if condition to handle it. And this is just like !THREAD_INFO_IN_TASK(the thead_info on stack), the thread_info is placed at the bottom of the stack, the end_of_stack = the bottom of stack + sizeof(*thread_info). Cheers, Dongsheng