O_NONBLOCK: should I blame git or ssh?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a parallel build that occasionally fails with the error message
"make: write error".  Make prints that error message as it is exiting when
it detects that it has seen errors while writing to stdout.  The error it
is enountering is an EAGAIN error, which implies that something has made
its stdout non-blocking.  As far as I've been able to tell so far, this is
occurring while make is running the command "git fetch --quiet --tags".
Once that command finishes, stdout goes back to being blocking but since
this is a parallel build, make is doing other work while this git command
is running, and may attempt to write to stdout during that time.

By stracing this git command, I can see it running subcommand

ssh -p 29418 user@gerrit.domain "git-upload-pack '/repo'"

and I can see that ssh command doing this:

39828 dup(0)                            = 5
39828 dup(1)                            = 6
39828 dup(2)                            = 7
39828 ioctl(5, TCGETS, 0x7ffea2880800)  = -1 ENOTTY (Inappropriate ioctl for device)
39828 fcntl(5, F_GETFL)                 = 0 (flags O_RDONLY)
39828 fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
39828 ioctl(6, TCGETS, 0x7ffea2880800)  = -1 ENOTTY (Inappropriate ioctl for device)
39828 fcntl(6, F_GETFL)                 = 0x1 (flags O_WRONLY)
39828 fcntl(6, F_SETFL, O_WRONLY|O_NONBLOCK) = 0
39828 ioctl(7, TCGETS, {B38400 opost isig icanon echo ...}) = 0
39828 fcntl(5, F_SETFD, FD_CLOEXEC)     = 0
39828 fcntl(6, F_SETFD, FD_CLOEXEC)     = 0
39828 fcntl(7, F_SETFD, FD_CLOEXEC)     = 0
...
39828 ioctl(0, TCGETS, 0x7ffea28806e0)  = -1 ENOTTY (Inappropriate ioctl for device)
39828 fcntl(0, F_GETFL)                 = 0x800 (flags O_RDONLY|O_NONBLOCK)
39828 fcntl(0, F_SETFL, O_RDONLY)       = 0
39828 ioctl(1, TCGETS, 0x7ffea28806e0)  = -1 ENOTTY (Inappropriate ioctl for device)
39828 fcntl(1, F_GETFL)                 = 0x801 (flags O_WRONLY|O_NONBLOCK)
39828 fcntl(1, F_SETFL, O_WRONLY)       = 0
39828 ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0

So ssh has dup'd descriptors 0, 1, and 2, and then turned on the O_NONBLOCK flag on
the copies of stdin and stdout.  You can see afterwards that ssh reads the flags on
descriptors 0 and 1, and both have O_NONBLOCK set.  It then clears that bit.  It set O_NONBLOCK
near the beginning of its runs an cleared it near the end.

Should this be considered a git bug or an ssh bug or something else?

I thought I had finally figured out exactly what is happening but while writing this,
now I'm not sure why my workaround appears to be working.  My workaround is to pipe
make's stdout into a simple program that reads make's output and writes it to where
make uses to write to, except it does a select() on descriptor 1 before writing, and
it makes sure to handle short counts.  But now I'm thinking that if it's the ssh started
indirectly by make that is messing with O_NONBLOCK, presumably it would be messing
with O_NONBLOCK on the write side of the pipe that make writes to, so make should
still be encountering EAGAIN errors. And yet my workaround does seem to work.

Thanks for any light you can shed on this.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux