On Wed, May 14, 2014 at 05:26:29PM +1000, Michael Ellerman wrote: > > > Not sure what the correct fix is. > > > > I think just clearing mainpid before we call exit is the right thing to > > do here. I'll audit all the other exit() calls too, as this might be a > > problem in other paths. > > Thanks. That fix is working for me. > > It still exits after a minute or so, because it fails to fork a child in > fork_children(). > > I have 64 cpus and 16GB of RAM, so that's only 250MB per child. > > If I reduce to 32 children then it runs much longer. > > I wonder though, should failing to fork a child be a fatal error? Or could it > just skip that child and continue. Maybe. It could wait until another child exits before retrying. Something like the patch below maybe. I think I tried something like this before though, and it resulted in a flood of failed forks. Let me know how this work out. Dave diff --git a/main.c b/main.c index f393f81ae0ba..be7108287dc9 100644 --- a/main.c +++ b/main.c @@ -79,6 +79,10 @@ static void fork_children(void) _exit(EXIT_SUCCESS); } else { if (pid == -1) { + /* We failed, wait for a child to exit before retrying. */ + if (shm->running_childs > 0) + return; + output(0, "couldn't create child! (%s)\n", strerror(errno)); shm->exit_reason = EXIT_FORK_FAILURE; exit_main_fail(); -- To unsubscribe from this list: send the line "unsubscribe trinity" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html