Re: [PATCH] run-command: don't spam trace2_child_exit()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2022.05.03 10:59, Jeff Hostetler wrote:
> 
> 
> On 4/28/22 5:46 PM, Junio C Hamano wrote:
> > Josh Steadmon <steadmon@xxxxxxxxxx> writes:
> > 
> > > In rare cases, wait_or_whine() cannot determine a child process's exit
> > > status (and will return -1 in this case). This can cause Git to issue
> > > trace2 child_exit events despite the fact that the child is still
> > > running.
> 
> I'm curious what is causing the spurious return values.
> Could you instrument wait_or_whine() and see which of the
> if/else arms are causing the -1 to be returned?
> 
> That routine is rather complicated and looks like it has 3
> different ways that a -1 could be returned.

Unfortunately we haven't been able to reproduce the issue, so
instrumentation won't clarify the cause unless we randomly hit another
instance.


> > 
> > Rather, we do not even know if the child is still running when it
> > happens, right?  It is curious what "rare cases" makes the symptom
> > appear.  Do we know?
> > 
> > The patch looks OK from the "we do not know the child exited in this
> > case, so we shouldn't be reporting the child exit" point of view, of
> > course.  Having one event that started a child in the log and then
> > having millions of events that reports the exit of the (same) child
> > is way too broken.  With this change, we remove these phoney exit
> > events from the log.
> > 
> > Do we know, for such a child process that caused these millions
> > phoney exit events, we got a real exit event at the end?  Otherwise,
> > we'd still have a similar problem in the opposite direction, i.e. a
> > child has a start event recorded, many exit event discarded but the
> > log lacks the true exit event for the child, implying that the child
> > is still running because we failed to log its exit?
> > 
> > >   int finish_command_in_signal(struct child_process *cmd)
> > >   {
> > >   	int ret = wait_or_whine(cmd->pid, cmd->args.v[0], 1);
> > > -	trace2_child_exit(cmd, ret);
> > > +	if (ret != -1)
> > > +		trace2_child_exit(cmd, ret);
> > >   	return ret;
> > >   }
> 
> Since this is only called from pager.c and is used to setup the
> pager, I have to wonder if you're only getting these spurious events
> for the pager process or for any of the other random child processes.
> 
> And whether they are received while the pager is alive and working
> properly, or when you're trying to quit the pager or when the pager
> is trying to signal eof.

It was for a pager run by `git diff`. The only other child process in
this instance was `git status`. I believe the events were generated
while the user was not interacting with the process; IIRC the pager was
left running in a terminal overnight and they discovered the next
morning that their disk was filled up by trace logs.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux