Re: t7006 sometimes hangs in cronjobs on OS X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King wrote:

>   1. In your Copy.pm log above, it says read gives it 4 characters. But
>      "hi\n" has only 3.

Yes, it's "hi\r\n".

> I would first try this patch:
[...]
> +++ b/t/test-terminal.perl
> @@ -15,6 +15,7 @@ sub start_child {
>  		open STDOUT, ">&", $out;
>  		open STDERR, ">&", $err;
>  		close $out;
> +		close $err;
>  		exec(@$argv) or die "cannot exec '$argv->[0]': $!"

Good idea.  No change, alas (and likewise with the change to close
the pty master in the child).  It seems I have some reading to do.

Jonathan

> and then try this more drastic one:
[...]
> --- a/t/test-terminal.perl
> +++ b/t/test-terminal.perl
> @@ -12,9 +12,12 @@ sub start_child {
[...]
> @@ -69,7 +72,7 @@ if ($#ARGV < 1) {
>  }
>  my $master_out = new IO::Pty;
>  my $master_err = new IO::Pty;
> -my $pid = start_child(\@ARGV, $master_out->slave, $master_err->slave);
> +my $pid = start_child(\@ARGV, $master_out, $master_err);

Runs through ~1000 iterations instead of 100 before hanging.

> Also, I don't know what kind of support you have for stuff like lsof,
> but in theory we should be able to get a hung process, find the open
> descriptor for the pty using lsof, match that descriptor with the other
> end of the pty, and then see which processes have that pty still open.

Trial 1
~~~~~~~
 PID 49145 (which has successfully pumped stdout):

  0	/dev/ttys001
  1 write-only	out.1707
  2 write-only	out.1707
  3	/dev/ptmx	@ offset 4
  5 write-only	debug.log

 PID 49147 (which is stuck in sysread trying to read stderr):

  0	/dev/ttys001
  1 write-only	out.1707
  2 write-only	out.1707
  5 write-only	debug.log
  6	/dev/ptmx	@ offset 0

Trial 2
~~~~~~~
 PID 51091 (which is stuck in sysread trying to read stdout):

  0	/dev/ttys001
  1 write-only	out.2017
  2 write-only	out.2017
  3	/dev/ptmx	@ offset 4
  5 write-only	debug.log

 PID 591093 (which successfully pumped stderr) is a zombie

(echo was a zombie in both cases.)

>> Redirecting stderr by using 'xsendfile("elsewhere", $err);' avoids
>> trouble.
>
> That seems doubly weird, since you are changing the _output_, not the
> input. But the input is what is causing the hang.

False alarm --- after about 2500 iterations it hangs.  Probably just
changed the timing.

>> Sometimes output includes some streams of null bytes, which makes me
>> suspect something awry in the kernel.
>
> Yuck.

Was my mistake --- apparently I was writing files with holes.  Now I
send debug output to a separate file with O_APPEND and it hasn't
happened again.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]