Re: O_NONBLOCK: should I blame git or ssh?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Sep 07, 2019 at 11:50:41PM +0000, Douglas Graham wrote:

> We have a parallel build that occasionally fails with the error message
> "make: write error".  Make prints that error message as it is exiting when
> it detects that it has seen errors while writing to stdout.  The error it
> is enountering is an EAGAIN error, which implies that something has made
> its stdout non-blocking.  As far as I've been able to tell so far, this is
> occurring while make is running the command "git fetch --quiet --tags".
> Once that command finishes, stdout goes back to being blocking but since
> this is a parallel build, make is doing other work while this git command
> is running, and may attempt to write to stdout during that time.

As I understand it, the mechanism you describe here for your symptoms is
plausible, because O_NONBLOCK is not attached to a particular
descriptor, but a POSIX "file description". So a sub-process setting
flags on its stdout can impact the parent process.

But...

> By stracing this git command, I can see it running subcommand
> 
> ssh -p 29418 user@gerrit.domain "git-upload-pack '/repo'"

I don't think this can directly be the culprit, because that ssh's
stdout will be hooked to a pipe talking to Git, not to the original
stdout of "git fetch". It should not have even received a descriptor
that is a copy of the original stdout (nor stdin), since those would
have been closed as part of the fork+exec.

The child ssh _does_ have access to the original stderr, which could
plausibly be a dup of the original stdout. But your strace shows ssh
setting the flag only for stdin/stdout.

> Should this be considered a git bug or an ssh bug or something else?

I'm not convinced that what you've found is the root cause of your
problem. But even if it is, I don't think you can fault Git. We never
touch the descriptor flags ourselves, and ssh is a black box to us.

> I thought I had finally figured out exactly what is happening but while writing this,
> now I'm not sure why my workaround appears to be working.  My workaround is to pipe
> make's stdout into a simple program that reads make's output and writes it to where
> make uses to write to, except it does a select() on descriptor 1 before writing, and
> it makes sure to handle short counts.  But now I'm thinking that if it's the ssh started
> indirectly by make that is messing with O_NONBLOCK, presumably it would be messing
> with O_NONBLOCK on the write side of the pipe that make writes to, so make should
> still be encountering EAGAIN errors. And yet my workaround does seem to work.

It's possible that it helps because your pipe-reading program is
(almost) always trying to read, and thus "make" on the writing end of
the pipe never sees EAGAIN. As long as your program reads the output
before the pipe buffer fills up, you'd be fine. Where did the output go
in your original? If to a regular file, I'd expect O_NONBLOCK to have no
effect anyway.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux