On Tue, Apr 27, 2021 at 01:43:08PM -0700, Yu-cheng Yu wrote: > @@ -181,6 +184,12 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, > if (clone_flags & CLONE_SETTLS) > ret = set_new_tls(p, tls); > > +#ifdef CONFIG_X86_64 IS_ENABLED > + /* Allocate a new shadow stack for pthread */ > + if (!ret) > + ret = shstk_setup_thread(p, clone_flags, stack_size); > +#endif > + And why is this addition here... > if (!ret && unlikely(test_tsk_thread_flag(current, TIF_IO_BITMAP))) > io_bitmap_share(p); ... instead of here? <--- > > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > index c815c7507830..d387df84b7f1 100644 > --- a/arch/x86/kernel/shstk.c > +++ b/arch/x86/kernel/shstk.c > @@ -70,6 +70,55 @@ int shstk_setup(void) > return 0; > } > +int shstk_setup_thread(struct task_struct *tsk, unsigned long clone_flags, Judging by what this function does, its name wants to be shstk_alloc_thread_stack() or so? > + unsigned long stack_size) > +{ > + unsigned long addr, size; > + struct cet_user_state *state; > + struct cet_status *cet = &tsk->thread.cet; The tip-tree preferred ordering of variable declarations at the beginning of a function is reverse fir tree order:: struct long_struct_name *descriptive_name; unsigned long foo, bar; unsigned int tmp; int ret; The above is faster to parse than the reverse ordering:: int ret; unsigned int tmp; unsigned long foo, bar; struct long_struct_name *descriptive_name; And even more so than random ordering:: unsigned long foo, bar; int ret; struct long_struct_name *descriptive_name; unsigned int tmp; > + > + if (!cet->shstk_size) > + return 0; > + This check needs a comment. > + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) > + return 0; > + > + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, > + XFEATURE_CET_USER); Let that line stick out. > + > + if (!state) > + return -EINVAL; > + > + if (stack_size == 0) if (!stack_size) > + return -EINVAL; and that test needs to be done first in the function. > + > + /* Cap shadow stack size to 4 GB */ Why? > + size = min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G); > + size = min(size, stack_size); > + > + /* > + * Compat-mode pthreads share a limited address space. > + * If each function call takes an average of four slots > + * stack space, allocate 1/4 of stack size for shadow stack. > + */ > + if (in_compat_syscall()) > + size /= 4; <---- newline here. > + size = round_up(size, PAGE_SIZE); > + addr = alloc_shstk(size); > + ^ Superfluous newline. > + if (IS_ERR_VALUE(addr)) { > + cet->shstk_base = 0; > + cet->shstk_size = 0; > + return PTR_ERR((void *)addr); > + } > + > + fpu__prepare_write(&tsk->thread.fpu); > + state->user_ssp = (u64)(addr + size); cet_user_state has u64, cet_status has unsigned longs. Make them all u64. And since cet_status is per thread, but I had suggested struct shstk_desc, I think now that that should be called struct thread_shstk or so to denote *exactly* what it is. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette