On February 15, 2019 15:37, Max Kirillov wrote: > On Fri, Feb 15, 2019 at 02:02:13PM +0100, SZEDER Gábor wrote: > > I haven't yet seen that hang in the wild and couldn't reproduce it on > > purpose, but there is definitely something fishy with t5562 even on > > Linux and even without that perl generate_zero_bytes helper. > > > > It won't show most of the processes run in the tests, because they are > > just too fast and short-lived. However, occasionally it does show a > > stuck git process, which is shown as <defunct> in regular 'ps aux' > > output: > > > > szeder 5722 0.0 0.0 0 0 pts/16 Z+ 13:36 0:00 [git] <defunct> > > > > Note that this is not a "proper" hang, in the sense that this process > > is not stuck forever, but only for about 1 minute > > This is probably because of SIGCHILD comes before "sleep". I believe this is > unrelated to the hang issue. The hang issue looks like something is wrong > with cleanu_children(), or maybe in the child which it tries to kill and wait, > not in tests. > > As for this zombie issue, could be fixed with, for example, more busy wait > like the following. It may with some bigger probability miss SIGCHILD to the > first sleep because there is a bit more to do before it. But the penalty is only > 1 second now, and as it still happens rarely there seems to be no visible > degradation. > > --- 8< ----------- > diff --git a/t/t5562/invoke-with-content-length.pl b/t/t5562/invoke-with- > content-length.pl > index 0943474af2..257e280e3b 100644 > --- a/t/t5562/invoke-with-content-length.pl > +++ b/t/t5562/invoke-with-content-length.pl > @@ -29,7 +29,12 @@ > } > print $out $body_data or die "Cannot write data: $!"; > > -sleep 60; # is interrupted by SIGCHLD > +my $counter = 0; > +while (not $exited and $counter < 60) { > + sleep 1; > + $counter = $counter + 1; > +} > + > if (!$exited) { > close($out); > die "Command did not exit after reading whole body"; >From the trace I found in perl, we have gone past sleep and are hung at close($out); Commenting out the close() does nothing because perl still hangs on an implied close resulting from the exception thrown by die(). See my other post on adding GIT_TRACE and the changes resulting from that. Sadly, the fix does not change the results. In fact, it makes the hang far more likely. Subtest 6,7,8 fails here, at close() waitpid + 0x130 (SLr) $n_EnterPriv + 0x280 (Milli) Perl_wait4pid + 0x130 (UCr) Perl_my_pclose + 0x4C0 (UCr) Perl_io_close + 0x180 (UCr) Perl_do_close + 0x620 (UCr) Perl_pp_close + 0xA70 (UCr) Perl_runops_standard + 0xF0 (UCr) S_run_body + 0x870 (UCr) perl_run + 0x2D0 (UCr) main + 0x3D0 (UCr)