On Mon, Apr 1, 2013 at 8:22 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, Apr 01, 2013 at 05:58:55PM -0600, Felipe Contreras wrote: >> Are you saying that even if we have stored the result of a waitpid >> command, if errno is EINTR, then we should still loop waitpid()? If >> so, I guess this would do the trick: > > Yes, I think that would work. Though I wonder if it is even worth > storing EINTR at all in the first place. It tells us nothing. In fact, > does storing any error condition really tell us anything? Probably not, I just tried to minimize the potential behavior changes. > The two states > we are interested in at this point are: > > 1. We have reaped the child via waitpid; here is its status. > > 2. We have not (either we did not try, it was not dead yet, or we were > not able to due to an error). We should now try it again. > > If we got EINTR the first time around, we would likely get the "real" > answer this time. If we get anything else (like EINVAL or ECHILD), then > we would get the same thing again calling waitpid() later. > >> > We now take argv0 into wait_or_whine. But I don't see it being used. >> > What's it for? >> >> It was there before: >> -static int wait_or_whine(pid_t pid, const char *argv0) >> +static int wait_or_whine(struct child_process *cmd, pid_t pid, const >> char *argv0) > > Ah, sorry, I misread the diff. We are adding "cmd", not "argv0". Yeah, which in fact was already there before. > That would trigger the rest of your code in the error case, which I > think was your original intent. But then we return "0" from > check_command. Is that right? > > There are three states we can be in from calling waitpid: > > 1. The process is dead. > > 2. The process is not dead. > > 3. We could not determine which because waitpid returned an error. > > It is clear that check_command is trying to tell its caller (1) or (2); > but what should it say in case of (3)? > > Naively, given how patch 2 uses it, I think it would actually make sense > for it to return 1. That is, the semantics are "return 0 if and only if > the pid is verified to be dead; otherwise return 1". I thought if there was an error that constituted a failure. > But if we know from reading waitpid(3) that waitpid should only fail due > to EINTR, or due to bogus arguments (e.g., a pid that does not exist or > has already been reaped), then maybe something like this makes sense: > > while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR) > ; /* nothing */ But we don't want to wait synchronously here, we just want to ping. > /* pid definitely still going */ > if (!waiting) > return 1; > > /* pid definitely died */ > if (waiting == cmd->pid) { > cmd->last_status.valid = 1; > cmd->last_status.status = status; > return 0; > } > > /* > * this should never happen, since we handed waitpid() a single > * pid, so it should either return that pid, 0, or an error. > */ > if (waiting > 0) > die("BUG: waitpid reported a random pid?"); > > /* > * otherwise, we have an error. Assume the pid is gone, since that > * is the only reason for waitpid to report a problem besides EINTR. > * We don't bother recording errno, since we can just repeat > * the waitpid again later. > */ > return 0; The rest makes sense. >> >> + cmd->last_wait.code = -1; >> >> + cmd->last_wait.failed_errno = failed_errno; >> >> + cmd->last_wait.status = status; >> > >> > Since we can only get here when waiting == cmd->pid, >> >> No, also when waiting < 0. > > After the fix above, yes; in the original we would always have exited > already. No: + if (waiting != cmd->pid) + return 1; If waiting < 0, waiting != cmd->pid, and therefore this return is not triggered, and there's only one more return at the end of the function. Cheers. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html