Re: execve is not atomic, what is the exit state of the process when execve fails after throwing away the original process image?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 4, 2014 at 9:27 PM, Michael Kerrisk (man-pages)
<mtk.manpages@xxxxxxxxx> wrote:
> [CC+=Rich Felker, because the discussion started with a reference to
> http://ewontfix.com/14/ ]
>
> On 05/04/2014 12:18 AM, Steven Stewart-Gallus wrote:
>>
>> ----- Original Message -----
>> From: Jann Horn <jann@xxxxxxxxx>
>> Date: Saturday, May 3, 2014 10:45 am
>> Subject: Re: execve is not atomic, what is the exit state of the process when
>> execve fails after throwing away the original process image?
>> To: Steven Stewart-Gallus <sstewartgallus00@xxxxxxxxxxxxxxx>
>> Cc: linux-api@xxxxxxxxxxxxxxx
>>
>>> On Fri, May 02, 2014 at 02:19:52AM +0000, Steven Stewart-Gallus wrote:
>>>> execve is not atomic, what is the exit state of the process when
>>>> execve fails after throwing away the original process image?
>>>
>>> See http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740 or
>>> so – as far as I know, the kernel sends a SIGKILL. Does that help?
>>
>> Thank you Jann
>> Horn. http://lxr.free-electrons.com/source/fs/binfmt_elf.c#L740
>> answers my question.
>>
>> On reflection, the kernel code makes sense. The process must either
>> exit with an error code or raise the SIGKILL signal because SIGKILL
>> and SIGSTOP are the only unblockable signals (of course, the kernel
>> has the privileges to do whatever it wants but it tries to be
>> consistent with userspace).
>>
>> Strangely, in other places the SIGSEGV is sent when the ELF file is
>> incorrect in some places and I don't fully understand that part of the
>> code. Still, I understand enough to look at the code in more detail
>> later.
>>
>> Thank you,
>> Steven Stewart-Gallus
>>
>> P.S.
>>
>> I'm CC'ing Michael because he wanted to know this case so could
>> document it.
>
> Fair enough. I plan to add the following text to the execve(2) man
> page:
>
>        In most cases where execve()  fails,  control  returns  to  the
>        original  executable image, and the caller of execve() can then
>        handle the error.  However, in (rare) cases  (typically  caused
>        by resource exhaustion), failure may occur past the point of no
>        return: the original exectable image has been  torn  down,  but
>        the  new  image  could not be completely built.  In such cases,
>        the kernel kills the process with a SIGKILL signal.
>
> Comments?

It turns out to be not too hard to trigger this case. See, for
example, the attached pair of programs, and the shell log below.

Cheers,

Michael

# Beware: if you try the below, the OOM killer may kill something random
# (Okay, not random: probably it'll be that hog firefox ;-).)

# Disable memory overvcommit (see proc(5))
$ sudo sh -c "echo 2 > /proc/sys/vm/overcommit_memory"

$ ./multi_fork_exec ./large_image
cnt = 0
cnt = 1
cnt = 2
cnt = 3
[...]
cnt = 213
cnt = 214
cnt = 215
    Child PID=26070
    Status: child killed by signal 9 (Killed)
    Child PID=26062
    Status: child killed by signal 9 (Killed)
    Child PID=26053
    Status: child killed by signal 9 (Killed)
    Child PID=25900
    Status: child killed by signal 9 (Killed)
[...]

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
/*#* multi_fork_exec.c 
 
   Use with large_image.c to trigger this execve() case:


      In most cases where execve()  fails,  control  returns  to  the
      original  executable image, and the caller of execve() can then
      handle the error.  However, in (rare) cases  (typically  caused
      by resource exhaustion), failure may occur past the point of no
      return: the original executable image has been torn  down,  but
      the  new  image  could not be completely built.  In such cases,
      the kernel kills the process with a SIGKILL signal.
*/
/*#**
   Change history

   04 May 14	Initial creation
*/
#define _GNU_SOURCE  
#include <sys/wait.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>

#define errExit(msg) 	do { perror(msg); exit(EXIT_FAILURE); \
                        } while (0)


static void 	/* Examine a wait() status using the W* macros */
printWaitStatus(const char *msg, int status)
{
    if (msg != NULL)
        printf("%s", msg);

    if (WIFEXITED(status)) {
        printf("child exited, status=%d\n", WEXITSTATUS(status));

    } else if (WIFSIGNALED(status)) {
        printf("child killed by signal %d (%s)",
                WTERMSIG(status), strsignal(WTERMSIG(status)));
#ifdef WCOREDUMP    	/* Not in SUSv3, may be absent on some systems */
        if (WCOREDUMP(status))
            printf(" (core dumped)");
#endif
        printf("\n");

    } else if (WIFSTOPPED(status)) {
        printf("child stopped by signal %d (%s)\n",
                WSTOPSIG(status), strsignal(WSTOPSIG(status)));

#ifdef WIFCONTINUED 	/* SUSv3 has this, but older Linux versions and
                           some other UNIX implementations don't */
    } else if (WIFCONTINUED(status)) {
        printf("child continued\n");
#endif

    } else {		/* Should never happen */
        printf("what happened to this child? (status=%x)\n",
                (unsigned int) status);
    }
}

static void             /* Handler for child termination signal */
grimReaper(int sig)
{   
    int status;                 /* Child status from waitpid() */
    pid_t pid;
    int savedErrno;

    savedErrno = errno;

    while ((pid = waitpid(-1, &status, 0)) > 0) {
        if (pid == -1)
                errExit("waitpid");
        printf("\tChild PID=%ld\n", (long) pid);
        printWaitStatus("\tStatus: ", status);
    }
    errno = savedErrno;
}

int
main(int argc, char *argv[])
{
    int cnt;
    pid_t cpid;
    struct sigaction sa;

    /* Set up handler to reap dead children */

    sa.sa_flags = 0;
    sa.sa_handler = grimReaper;
    sigemptyset(&sa.sa_mask);
    if (sigaction(SIGCHLD, &sa, NULL) == -1)
        errExit("sigaction");

    /* Create multiple children, each of which execs the program named in
       argv[1] */

    for (cnt = 0; ; cnt++) {
        printf("cnt = %d\n", cnt);

        cpid = fork();
        if (cpid == -1)
            errExit("fork");

        if (cpid == 0) {	/* Child */
            execv(argv[1], &argv[1]);
            errExit("execv");
        }

        /* Parent continues round loop */
    }

    exit(EXIT_SUCCESS);
}
/*#* large_image.c 
*/
/*#**
   Change history

   04 May 14	Initial creation
*/
#include <unistd.h>
#include <stdlib.h>

/* Make this image large, to chew up a good bit of RAM/swap */

char buf[100 * 1000 * 1000];

int
main(int argc, char *argv[])
{
    sleep(30);
    exit(EXIT_SUCCESS);
}

[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux