Kees Cook <keescook@xxxxxxxxxxxx> writes: > On Thu, Sep 23, 2021 at 07:12:33PM -0500, Eric W. Biederman wrote: >> >> Today when a signal is delivered with a handler of SIG_DFL whose >> default behavior is to generate a core dump not only that process but >> every process that shares the mm is killed. >> >> In the case of vfork this looks like a real world problem. Consider >> the following well defined sequence. >> >> if (vfork() == 0) { >> execve(...); >> _exit(EXIT_FAILURE); >> } >> >> If a signal that generates a core dump is received after vfork but >> before the execve changes the mm the process that called vfork will >> also be killed (as the mm is shared). >> >> Similarly if the execve fails after the point of no return the kernel >> delivers SIGSEGV which will kill both the exec'ing process and because >> the mm is shared the process that called vfork as well. >> >> As far as I can tell this behavior is a violation of people's >> reasonable expectations, POSIX, and is unnecessarily fragile when the >> system is low on memory. >> >> Solve this by making a userspace visible change to only kill a single >> process/thread group. This is possible because Jann Horn recently >> modified[1] the coredump code so that the mm can safely be modified >> while the coredump is happening. With LinuxThreads long gone I don't >> expect anyone to have a notice this behavior change in practice. >> >> To accomplish this move the core_state pointer from mm_struct to >> signal_struct, which allows different thread groups to coredump >> simultatenously. >> >> In zap_threads remove the work to kill anything except for the current >> thread group. >> >> [1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") >> Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") >> History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > > This looks correct to me, but depends on the 5/6 not introducing any > races. So, to that end: > > Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx> > > If you have some local tools you've been using for testing this series, > can you toss them into tools/testing/selftests/ptrace/ ? I can help > clean them up if want. I just have a program that goes multi-thread and calls abort, and a slight variant of it that calls vfork before calling abort. It is enough to exercise the code and verify I didn't make any typos. I have attached the code below. If you can help make it into a proper test that would be great. I have just been manually running gdb and the like to verify the kernel works as expected. Eric
#include <stdio.h> #include <pthread.h> #include <sys/ptrace.h> #include <sys/types.h> #include <signal.h> #include <unistd.h> #include <stdlib.h> #include <stdio.h> #include <string.h> #include <errno.h> struct params { int argc; char **argv; char **envp; pthread_t parent; pthread_t sibling[30]; }; static void *dump_thread(void *arg) { struct params *params = arg; void *retval; int i; pthread_join(params->parent, &retval); fprintf(stdout, "Waiting for 5s\n"); sleep(5); fprintf(stdout, "Dumping\n"); abort(); fprintf(stdout, "Abort Failed: %d %s\n", errno, strerror(errno)); for (i = 0; i <= 29; i++) { pthread_join(params->sibling[i], &retval); } fprintf(stdout, "All Done!\n"); _exit(EXIT_FAILURE); return NULL; } static void *idle_thread(void *arg) { unsigned long i = (unsigned long)arg; sleep(10); fprintf(stdout, "Done %lu\n", i); fflush(stdout); return NULL; } int main(int argc, char **argv, char **envp) { struct params *params; pthread_t pt; unsigned long i; params = malloc(sizeof(struct params)); params->argc = argc - 1; params->argv = argv = argv + 1; params->envp = envp; params->parent = pthread_self(); pthread_create(&pt, NULL, dump_thread, params); for (i = 0; i <= 29; i++) pthread_create(¶ms->sibling[i], NULL, idle_thread, (void *)i); pthread_exit(NULL); return 0; }