On Sun, Feb 10, 2019 at 11:05:52AM -0600, Eric W. Biederman wrote: > Ivan Delalande <colona@xxxxxxxxxx> writes: > > A difference I've noticed with your tree (unrelated to my issue here but > > that you may want to look at) is when I run my reproducer under > > strace -f, I'm now getting quite a lot of "Exit of unknown pid 12345 > > ignored" warnings from strace, which I've never seen with mainline. > > My reproducer simply fork-exec tail processes in a loop, and tries to > > sigkill them in the parent with a variable delay. > > What was your base tree? It was just off v5.0-rc5, and I didn't see these warnings on the last few RCs either. Now I'm seeing them on vanilla v5.0-rc6 as well. > My best guess is that your SIGKILL is getting there before strace > realizes the process has been forked. If we can understand the race > it is probably worth fixing. > > Any chance you can post your reproducer. Sure, see the attachment. I think this is the simplest version where these warnings show up. This one just forks/exec `tail -a` to make it fail and exit 1 as soon as possible, and progressively increase the delay between the fork and sigkill to try to hit our original issue, stopping and restarting only after 10 completions of the child as the timing varies a fair bit. Running this program under `strace -f -o /dev/null` prints the warnings almost instantly on my system. > It is possible it is my most recent fixes, or it is possible something > changed from the tree you were testing and the tree you are working > on. Thanks, -- Ivan Delalande Arista Networks
#define _GNU_SOURCE #include <time.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <signal.h> #include <stdio.h> int main(void) { pid_t pid; int status; size_t i, count; unsigned long max = 300000, first; struct timespec ts = { .tv_nsec = 1 }; char* const argv[] = {"/bin/tail", "-a", NULL}; for (i = 0; i < 42000; ++i) { for (count = first = 0, ts.tv_nsec = 1; ts.tv_nsec < max && count < 10; ts.tv_nsec += 1) { if ((pid = fork())) { if (pid < 0) continue; nanosleep(&ts, NULL); kill(pid, SIGKILL); if (waitpid(pid, &status, 0) != pid) continue; if (WIFSIGNALED(status) && WTERMSIG(status) == 9) { continue; } else if (WIFEXITED(status) && WEXITSTATUS(status) == 1) { count++; if (!first) first = ts.tv_nsec; } else printf("%lu: %x\n", ts.tv_nsec, status); } else { close(STDOUT_FILENO); close(STDERR_FILENO); execve("/bin/tail", argv, NULL); _exit(2); } } if (max < ts.tv_nsec) max = ts.tv_nsec; if (count < 10) max += 5000; printf("break at %lu (max: %lu) count %lu (first at %lu)\n", ts.tv_nsec, max, count, first); } return 0; }