On 2022.04.28 14:46, Junio C Hamano wrote: > Josh Steadmon <steadmon@xxxxxxxxxx> writes: > > > In rare cases, wait_or_whine() cannot determine a child process's exit > > status (and will return -1 in this case). This can cause Git to issue > > trace2 child_exit events despite the fact that the child is still > > running. > > Rather, we do not even know if the child is still running when it > happens, right? Correct, if you'd like me to clarify the commit message I'll send a V2. > It is curious what "rare cases" makes the symptom > appear. Do we know? Unfortunately, no. The quoted 80 million exit event instance was not reproducible. > The patch looks OK from the "we do not know the child exited in this > case, so we shouldn't be reporting the child exit" point of view, of > course. Having one event that started a child in the log and then > having millions of events that reports the exit of the (same) child > is way too broken. With this change, we remove these phoney exit > events from the log. > > Do we know, for such a child process that caused these millions > phoney exit events, we got a real exit event at the end? We don't know. The trace log filled up the user's disk in this case, so the log was truncated. > Otherwise, > we'd still have a similar problem in the opposite direction, i.e. a > child has a start event recorded, many exit event discarded but the > log lacks the true exit event for the child, implying that the child > is still running because we failed to log its exit? Yes, that is a weakness with this approach. > > int finish_command_in_signal(struct child_process *cmd) > > { > > int ret = wait_or_whine(cmd->pid, cmd->args.v[0], 1); > > - trace2_child_exit(cmd, ret); > > + if (ret != -1) > > + trace2_child_exit(cmd, ret); > > return ret; > > } > > Will queue; thanks. Thanks, and sorry for the delayed reply, I've been out sick for a few days.