On Fri, May 21, 2021 at 03:12:04PM -0700, Yu-cheng Yu wrote: > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > index 5ea2b494e9f9..8e5f772181b9 100644 > --- a/arch/x86/kernel/shstk.c > +++ b/arch/x86/kernel/shstk.c > @@ -71,6 +71,53 @@ int shstk_setup(void) > return 0; > } > > +int shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags, > + unsigned long stack_size) > +{ > + struct thread_shstk *shstk = &tsk->thread.shstk; > + struct cet_user_state *state; > + unsigned long addr; > + > + if (!stack_size) > + return -EINVAL; I've been doing some light testing on AMD hardware and I've found that this version of the patchset doesn't boot for me. It appears that when systemd processes start spawning, they hit the above case, return -EINVAL, and the fork fails. In these cases, copy_thread has been passed 0 for both sp and stack_size. For previous versions of the patchset, I can still boot. When the stack_size check was last, the function would always return before completing the check, hitting one of the two cases below. At the very least, it would seem that on some systems, it isn't valid to rely on the stack_size passed from clone3, though I'm unsure what the correct behavior should be here. If the passed stack_size == 0 and sp == 0, is this a case where we want to alloc a shadow stack for this thread with some capped size? Alternatively, is this a case that isn't valid to alloc a shadow stack and we should simply return 0 instead of -EINVAL? I'm running Fedora 34 which satisfies the required versions of gcc, binutils, and glibc. Please let me know if there is any additional information I can provide. Thanks, John > + > + if (!shstk->size) > + return 0; > + > + /* > + * For CLONE_VM, except vfork, the child needs a separate shadow > + * stack. > + */ > + if ((clone_flags & (CLONE_VFORK | CLONE_VM)) != CLONE_VM) > + return 0; > + > + state = get_xsave_addr(&tsk->thread.fpu.state.xsave, XFEATURE_CET_USER); > + if (!state) > + return -EINVAL; > + > + /* > + * Compat-mode pthreads share a limited address space. > + * If each function call takes an average of four slots > + * stack space, allocate 1/4 of stack size for shadow stack. > + */ > + if (in_compat_syscall()) > + stack_size /= 4; > + > + stack_size = round_up(stack_size, PAGE_SIZE); > + addr = alloc_shstk(stack_size); > + if (IS_ERR_VALUE(addr)) { > + shstk->base = 0; > + shstk->size = 0; > + return PTR_ERR((void *)addr); > + } > + > + fpu__prepare_write(&tsk->thread.fpu); > + state->user_ssp = (u64)(addr + stack_size); > + shstk->base = addr; > + shstk->size = stack_size; > + return 0; > +} > + > void shstk_free(struct task_struct *tsk) > { > struct thread_shstk *shstk = &tsk->thread.shstk; > @@ -80,7 +127,13 @@ void shstk_free(struct task_struct *tsk) > !shstk->base) > return; > > - if (!tsk->mm) > + /* > + * When fork() with CLONE_VM fails, the child (tsk) already has a > + * shadow stack allocated, and exit_thread() calls this function to > + * free it. In this case the parent (current) and the child share > + * the same mm struct. > + */ > + if (!tsk->mm || tsk->mm != current->mm) > return; > > while (1) {