Re: [PATCH] Portability: returning void

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 29, 2011 at 06:49:55PM -0500, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > The problem is that the sleeps hang around for 100 seconds, and they are
> > connected to the test script's stdout. It works to run "./t0081-*"
> > because bash sees the SIGCHLD and knows the script is done. But the
> > prove program actually ignore the SIGCHLD and waits until stdout and
> > stderr on the child are closed.
> 
> Strange.  Why would prove tell its children to ignore SIGCHLD and
> SIGTERM?

No, you misunderstand. It is prove itself that ignores the SIGCHLD. It
is stuck in the loop in TAP::Parser::Iterator::Process::_next. It has
gotten SIGCHLD, but it keeps blocking waiting to get EOF on the child's
stdout.

> > Double-weird is that if you "strace" the prove process, it will still
> > hang. But if you "strace -f", it _won't_ hang.
> 
> Well, it hangs for me. :)  The strangest aspect is that after 100
> seconds, all is well again, which suggests that there's more happening
> than an unreaped process.

Doesn't that point to an unreaped process? After 100 seconds the sleep
process closes, prove gets EOF, and it completes. Lowering the "100" to
"1" caused a 1-second hang for me.

> | 19398 18:31:12 exit_group(0)            = ?
> | 19397 18:31:12 <... select resumed> )   = ? ERESTARTNOHAND (To be restarted)
> | 19397 18:31:12 --- SIGCHLD (Child exited) @ 0 (0) ---
> | 19397 18:31:12 select(8, [4 6], NULL, NULL, NULL <unfinished ...>
> 
> The test script exits, but "prove" is stuck in select and does not
> want to start reaping yet.  So presumably the test script's children
> are adopted by init.  We wait around 13 seconds, and then:

Right, prove is stuck in the select. You can see it even got SIGCHLD
above, and if you check your process list, you will probably see the
defunct bash process. But instead of realizing its child has died, it
insists on waiting until the pipe is closed. Nothing has to be adopted
by init. There are simply still processes with the pipe open.

> | 19424 18:31:25 <... nanosleep resumed> NULL) = 0
> | 19424 18:31:25 close(1)                 = 0
> | 19424 18:31:25 close(2)                 = 0
> | 19424 18:31:25 exit_group(0)            = ?
> | 19422 18:31:25 <... wait4 resumed> 0x7fff65d1ee6c, 0, NULL) = ? ERESTARTSYS (To be restarted)
> | 19422 18:31:25 --- SIGTERM (Terminated) @ 0 (0) ---
> 
> The first sleep wakes up and dies.  The corresponding subshell
> wakes up, reaps the child, and finally accepts SIGTERM.

Hrm. That's different than what happens on my system. On my system, the
bash process is _already_ dead during the whole procedure, and it is
just the stray sleeps that keep prove waiting.

Maybe different bash versions? Mine is 4.1.5(1) (from debian unstable,
bash_4.1-3).

> | 19397 18:31:26 <... select resumed> )   = 2 (in [4 6])
> | 19397 18:31:26 read(4, "", 65536)       = 0
> | 19397 18:31:26 read(6, "", 65536)       = 0
> | 19397 18:31:26 wait4(19398, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 19398
> 
> Now "prove" wakes up again.

Right, because the pipe is finally closed.

Did you try my 5>/dev/null patch? With it, I get no hang at all.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]