RE: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Randall,

On Thu, 14 Feb 2019, Randall S. Becker wrote:

> On February 14, 2019 17:39, Junio C Hamano wrote:
> > To: Randall S. Becker <rsbecker@xxxxxxxxxxxxx>
> > Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@xxxxxxxxx>;
> > git@xxxxxxxxxxxxxxx; 'Max Kirillov' <max@xxxxxxxxxx>
> > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1
> > 
> > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes:
> > 
> > > Unfortunately, subtest 13 still hangs on NonStop, even with this
> > > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but
> > > I don't think this actually addresses the root cause of the hang.
> > 
> > Sigh.
> > 
> > > possible this is not the test that is failing, but actually the
> > > git-http-backend? The code is not in a loop, if that helps. It is not
> > > consuming any significant cycles. I don't know that part of the code
> > > at all, sadly. The code is here:
> > >
> > > * in the operating system from here up *
> > >   cleanup_children + 0x5D0 (UCr)
> > >   cleanup_children_on_exit + 0x70 (UCr)
> > >   git_atexit_dispatch + 0x200 (UCr)
> > >   __process_atexit_functions + 0xA0 (DLL zcredll)
> > >   CRE_TERMINATOR_ + 0xB50 (DLL zcredll)
> > >   exit + 0x2A0 (DLL zcrtldll)
> > >   die_webcgi + 0x240 (UCr)
> > >   die_errno + 0x360 (UCr)
> > >   write_or_die + 0x1C0 (UCr)
> > >   end_headers + 0x1A0 (UCr)
> > >   die_webcgi + 0x220 (UCr)
> > >   die + 0x320 (UCr)
> > >   inflate_request + 0x520 (UCr)
> > >   run_service + 0xC20 (UCr)
> > >   service_rpc + 0x530 (UCr)
> > >   cmd_main + 0xD00 (UCr)
> > >   main + 0x190 (UCr)
> > >
> > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten or
> > > neglected somewhere between the test, perl, and git-http-backend.
> > 
> > So we are trying to die(), which actually happens in die_webcgi(), and
> then try
> > to write some message _but_ notice an error inside
> > write_or_dir() and try to exit because we do not want to recurse forever
> > trying to die, giving a message to say how/why we died, and die because
> > failing to give that message, forever.
> > 
> > But in our attempt to exit(), we try to "cleanup children" and that is
> what gets
> > stuck.
> > 
> > One big difference before and after the /dev/zero change is that the
> process
> > is now on a downstream of the pipe.  If we prepare a large file with a
> finite
> > size full of NULs and replace /dev/null with it, instead of feeding NULs
> from
> > the pipe, would it change the equation?
> 
> Doubtful. The processes are still around, and are waiting on read but not
> actively reading (CPU time is not going up, so we're not reading an infinite
> stream). To me, this is a pipe situation where there is simply nothing
> waiting on the pipe (maybe a flush missing?). I'm grasping are straws
> without knowing the actual process architecture of the test to debug it.

So could you try with this patch?

-- snipsnap --
diff --git a/http-backend.c b/http-backend.c
index d5cea0329a..7c1b4a2555 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name, int out, int buffer_input, ss
 
 done:
 	git_inflate_end(&stream);
+	close(0);
 	close(out);
 	free(full_request);
 }




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux