Hi Randall, On Thu, 14 Feb 2019, Randall S. Becker wrote: > On February 14, 2019 17:39, Junio C Hamano wrote: > > To: Randall S. Becker <rsbecker@xxxxxxxxxxxxx> > > Cc: 'Johannes Schindelin via GitGitGadget' <gitgitgadget@xxxxxxxxx>; > > git@xxxxxxxxxxxxxxx; 'Max Kirillov' <max@xxxxxxxxxx> > > Subject: Re: [PATCH 0/1] Fix hang in t5562, introduced in v2.21.0-rc1 > > > > "Randall S. Becker" <rsbecker@xxxxxxxxxxxxx> writes: > > > > > Unfortunately, subtest 13 still hangs on NonStop, even with this > > > patch, so our Pipeline still hangs. I'm glad it's better on Azure, but > > > I don't think this actually addresses the root cause of the hang. > > > > Sigh. > > > > > possible this is not the test that is failing, but actually the > > > git-http-backend? The code is not in a loop, if that helps. It is not > > > consuming any significant cycles. I don't know that part of the code > > > at all, sadly. The code is here: > > > > > > * in the operating system from here up * > > > cleanup_children + 0x5D0 (UCr) > > > cleanup_children_on_exit + 0x70 (UCr) > > > git_atexit_dispatch + 0x200 (UCr) > > > __process_atexit_functions + 0xA0 (DLL zcredll) > > > CRE_TERMINATOR_ + 0xB50 (DLL zcredll) > > > exit + 0x2A0 (DLL zcrtldll) > > > die_webcgi + 0x240 (UCr) > > > die_errno + 0x360 (UCr) > > > write_or_die + 0x1C0 (UCr) > > > end_headers + 0x1A0 (UCr) > > > die_webcgi + 0x220 (UCr) > > > die + 0x320 (UCr) > > > inflate_request + 0x520 (UCr) > > > run_service + 0xC20 (UCr) > > > service_rpc + 0x530 (UCr) > > > cmd_main + 0xD00 (UCr) > > > main + 0x190 (UCr) > > > > > > Best guess is that a signal (SIGCHLD?) is possibly getting eaten or > > > neglected somewhere between the test, perl, and git-http-backend. > > > > So we are trying to die(), which actually happens in die_webcgi(), and > then try > > to write some message _but_ notice an error inside > > write_or_dir() and try to exit because we do not want to recurse forever > > trying to die, giving a message to say how/why we died, and die because > > failing to give that message, forever. > > > > But in our attempt to exit(), we try to "cleanup children" and that is > what gets > > stuck. > > > > One big difference before and after the /dev/zero change is that the > process > > is now on a downstream of the pipe. If we prepare a large file with a > finite > > size full of NULs and replace /dev/null with it, instead of feeding NULs > from > > the pipe, would it change the equation? > > Doubtful. The processes are still around, and are waiting on read but not > actively reading (CPU time is not going up, so we're not reading an infinite > stream). To me, this is a pipe situation where there is simply nothing > waiting on the pipe (maybe a flush missing?). I'm grasping are straws > without knowing the actual process architecture of the test to debug it. So could you try with this patch? -- snipsnap -- diff --git a/http-backend.c b/http-backend.c index d5cea0329a..7c1b4a2555 100644 --- a/http-backend.c +++ b/http-backend.c @@ -427,6 +427,7 @@ static void inflate_request(const char *prog_name, int out, int buffer_input, ss done: git_inflate_end(&stream); + close(0); close(out); free(full_request); }