On Sun, May 4, 2014 at 9:27 PM, Michael Kerrisk (man-pages) <mtk.manpages@xxxxxxxxx> wrote: > [CC+=Rich Felker, because the discussion started with a reference to > http://ewontfix.com/14/ ] > > On 05/04/2014 12:18 AM, Steven Stewart-Gallus wrote: >> >> ----- Original Message ----- >> From: Jann Horn <jann@xxxxxxxxx> >> Date: Saturday, May 3, 2014 10:45 am >> Subject: Re: execve is not atomic, what is the exit state of the process when >> execve fails after throwing away the original process image? >> To: Steven Stewart-Gallus <sstewartgallus00@xxxxxxxxxxxxxxx> >> Cc: linux-api@xxxxxxxxxxxxxxx >> >>> On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote: >>>> execve is not atomic, what is the exit state of the process when >>>> execve fails after throwing away the original process image? >>> >>> See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or >>> so – as far as I know, the kernel sends a SIGKILL. Does that help? >> >> Thank you Jann >> Horn. http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 >> answers my question. >> >> On reflection, the kernel code makes sense. The process must either >> exit with an error code or raise the SIGKILL signal because SIGKILL >> and SIGSTOP are the only unblockable signals (of course, the kernel >> has the privileges to do whatever it wants but it tries to be >> consistent with userspace). >> >> Strangely, in other places the SIGSEGV is sent when the ELF file is >> incorrect in some places and I don't fully understand that part of the >> code. Still, I understand enough to look at the code in more detail >> later. >> >> Thank you, >> Steven Stewart-Gallus >> >> P.S. >> >> I'm CC'ing Michael because he wanted to know this case so could >> document it. > > Fair enough. I plan to add the following text to the execve(2) man > page: > > In most cases where execve() fails, control returns to the > original executable image, and the caller of execve() can then > handle the error. However, in (rare) cases (typically caused > by resource exhaustion), failure may occur past the point of no > return: the original exectable image has been torn down, but > the new image could not be completely built. In such cases, > the kernel kills the process with a SIGKILL signal. > > Comments? It turns out to be not too hard to trigger this case. See, for example, the attached pair of programs, and the shell log below. Cheers, Michael # Beware: if you try the below, the OOM killer may kill something random # (Okay, not random: probably it'll be that hog firefox ;-).) # Disable memory overvcommit (see proc(5)) $ sudo sh -c "echo 2 > /proc/sys/vm/overcommit_memory" $ ./multi_fork_exec ./large_image cnt = 0 cnt = 1 cnt = 2 cnt = 3 [...] cnt = 213 cnt = 214 cnt = 215 Child PID=26070 Status: child killed by signal 9 (Killed) Child PID=26062 Status: child killed by signal 9 (Killed) Child PID=26053 Status: child killed by signal 9 (Killed) Child PID=25900 Status: child killed by signal 9 (Killed) [...] -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/
/*#* multi_fork_exec.c Use with large_image.c to trigger this execve() case: In most cases where execve() fails, control returns to the original executable image, and the caller of execve() can then handle the error. However, in (rare) cases (typically caused by resource exhaustion), failure may occur past the point of no return: the original executable image has been torn down, but the new image could not be completely built. In such cases, the kernel kills the process with a SIGKILL signal. */ /*#** Change history 04 May 14 Initial creation */ #define _GNU_SOURCE #include <sys/wait.h> #include <string.h> #include <sys/types.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <string.h> #include <errno.h> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0) static void /* Examine a wait() status using the W* macros */ printWaitStatus(const char *msg, int status) { if (msg != NULL) printf("%s", msg); if (WIFEXITED(status)) { printf("child exited, status=%d\n", WEXITSTATUS(status)); } else if (WIFSIGNALED(status)) { printf("child killed by signal %d (%s)", WTERMSIG(status), strsignal(WTERMSIG(status))); #ifdef WCOREDUMP /* Not in SUSv3, may be absent on some systems */ if (WCOREDUMP(status)) printf(" (core dumped)"); #endif printf("\n"); } else if (WIFSTOPPED(status)) { printf("child stopped by signal %d (%s)\n", WSTOPSIG(status), strsignal(WSTOPSIG(status))); #ifdef WIFCONTINUED /* SUSv3 has this, but older Linux versions and some other UNIX implementations don't */ } else if (WIFCONTINUED(status)) { printf("child continued\n"); #endif } else { /* Should never happen */ printf("what happened to this child? (status=%x)\n", (unsigned int) status); } } static void /* Handler for child termination signal */ grimReaper(int sig) { int status; /* Child status from waitpid() */ pid_t pid; int savedErrno; savedErrno = errno; while ((pid = waitpid(-1, &status, 0)) > 0) { if (pid == -1) errExit("waitpid"); printf("\tChild PID=%ld\n", (long) pid); printWaitStatus("\tStatus: ", status); } errno = savedErrno; } int main(int argc, char *argv[]) { int cnt; pid_t cpid; struct sigaction sa; /* Set up handler to reap dead children */ sa.sa_flags = 0; sa.sa_handler = grimReaper; sigemptyset(&sa.sa_mask); if (sigaction(SIGCHLD, &sa, NULL) == -1) errExit("sigaction"); /* Create multiple children, each of which execs the program named in argv[1] */ for (cnt = 0; ; cnt++) { printf("cnt = %d\n", cnt); cpid = fork(); if (cpid == -1) errExit("fork"); if (cpid == 0) { /* Child */ execv(argv[1], &argv[1]); errExit("execv"); } /* Parent continues round loop */ } exit(EXIT_SUCCESS); }
/*#* large_image.c */ /*#** Change history 04 May 14 Initial creation */ #include <unistd.h> #include <stdlib.h> /* Make this image large, to chew up a good bit of RAM/swap */ char buf[100 * 1000 * 1000]; int main(int argc, char *argv[]) { sleep(30); exit(EXIT_SUCCESS); }