On Wed, Aug 07, 2024 at 01:39:27PM +0100, Mark Brown wrote: > On Tue, Aug 06, 2024 at 10:08:44PM -0700, Kees Cook wrote: > > On Tue, Aug 06, 2024 at 04:10:02PM +0100, Mark Brown wrote: > > > > > # Running test 'Shadow stack with no token' > > > It took me a while to figure out where a thread switches shstk (even > > without this series): > > > kernel_clone, copy_process, copy_thread, fpu_clone, update_fpu_shstk > > (and shstk_alloc_thread_stack is called just before update_fpu_shstk). > > > I don't understand the token consumption in arch_shstk_post_fork(). This > > wasn't needed before with the fixed-size new shstk, why is it needed > > now? > > Concerns were raised on earlier rounds of review that since instead of > allocating the shadow stack as part of creating the new thread we are > using a previously allocated shadow stack someone could use this as part > of an exploit. You could just jump on top of any existing shadow stack > and cause writes to it. > > > Anyway, my attempt to trace the shstk changes for the test: > > > write(1, "TAP version 13\n", 15) = 15 > > write(1, "1..2\n", 5) = 5 > > clone3({flags=0, exit_signal=18446744073709551615, stack=NULL, stack_size=0}, 104) = -1 EINVAL (Invalid argument) > > write(1, "# clone3() syscall supported\n", 29) = 29 > > map_shadow_stack(NULL, 4096, 0) = 125837480497152 > > write(1, "# Shadow stack supportd\n", 24) = 24 > > write(1, "# Running test 'Shadow stack wit"..., 44) = 44 > > getpid() = 4943 > > write(1, "# [4943] Trying clone3() with fl"..., 51) = 51 > > map_shadow_stack(NULL, 4096, 0) = 125837480488960 > > clone3({flags=CLONE_VM, exit_signal=SIGCHLD, stack=NULL, stack_size=0, /* bytes 88..103 */ "\x00\xf0\x52\xd2\x72\x72\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00"} => {/* bytes 88..103 */ "\x00\xf0\x52\xd2\x72\x72\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00"}, 104) = 4944 > > getpid() = 4943 > > write(1, "# I am the parent (4943). My chi"..., 49strace: Process 4944 attached > > ) = 49 > > [pid 4944] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_CPERR, si_addr=NULL} --- > > [pid 4943] wait4(-1, <unfinished ...> > > [pid 4944] +++ killed by SIGSEGV (core dumped) +++ > > So we created the thread, then before we get to the wait4() in the > parent we start delivering a SEGV_CPERR to the child. The flow for the > child is as expected. > > > <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV && WCOREDUMP(s)}], __WALL, NULL) = 4944 > > --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=4944, si_uid=0, si_status=SIGSEGV, si_utime=0, si_stime=0} --- > > --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7272d21fffe8} --- > > +++ killed by SIGSEGV (core dumped) +++ > > Then the parent gets an ordinary segfault, not a shadow stack specific > one, like some memory got deallocated underneath it or a pointer got > corrupted. > > > [ 569.153288] shstk_setup: clone3[4943] ssp:7272d2200000 > > [ 569.153998] process: copy_thread: clone3[4943] new_ssp:7272d2530000 > > [ 569.154002] update_fpu_shstk: clone3[4943] ssp:7272d2530000 > > [ 569.154008] shstk_post_fork: clone3[4944] > > [ 569.154011] shstk_post_fork: clone3[4944] sending SIGSEGV post fork > > > I don't see an update_fpu_shstk for 4944? Should I with this test? > > I'd only expect to see one update, my understanding is that that update > is for the child but happening in the context of the parent as the hild > is not yet started. What's weird here that I don't understand is that the parent is 4943, so this report makes sense: > > [ 569.153288] shstk_setup: clone3[4943] ssp:7272d2200000 The child is 4944, yet I see: > > [ 569.153998] process: copy_thread: clone3[4943] new_ssp:7272d2530000 > > [ 569.154002] update_fpu_shstk: clone3[4943] ssp:7272d2530000 These map to my logging: copy_thread(struct task_struct *p, const struct kernel_clone_args *args) ... new_ssp = shstk_alloc_thread_stack(p, args); pr_err("%s: %s[%d] new_ssp:%lx\n", __func__, p->comm, task_pid_nr(p), new_ssp); and update_fpu_shstk(struct task_struct *dst, unsigned long ssp) ... xstate->user_ssp = (u64)ssp; pr_err("%s: %s[%d] ssp:%lx\n", __func__, dst->comm, task_pid_nr(dst), ssp); The child should be "p" (and "dst") here -- stuff is being copied from current to p, but p is reporting itself as 4943 here? (Oh, this is reporting pid, not tid... I bet that's what I've got wrong.) > Does this help: > > diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c > index 27acbdf44c5f..d7005974aff5 100644 > --- a/arch/x86/kernel/shstk.c > +++ b/arch/x86/kernel/shstk.c > @@ -258,6 +258,8 @@ unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, > if (args->shadow_stack) { > addr = args->shadow_stack; > size = args->shadow_stack_size; > + shstk->base = 0; > + shstk->size = 0; > } else { > /* > * For CLONE_VFORK the child will share the parents I'll fix my reporting and give this patch a try too. Thanks! -Kees -- Kees Cook