On Sun, Sep 08, 2019 at 02:18:15PM +0000, Douglas Graham wrote: > When I collected that strace output, I had stdout redirected to a pipe to my > workaround program, but I did not redirect stderr. So ssh made stdout non-blocking, > but since stderr was still connected to my terminal, it didn't touch that. But when > this build is run under Jenkins, both stdout and stderr are connected to a pipe that > Jenkins creates to collect output from the build. I assume that when git runs ssh, it > does not redirect ssh's stderr to its own pipe, it only redirects stdout. So I think > ssh will be messing with both pipes when this build is run under Jenkins. OK, that makes sense. > Now that I have a fairly good understanding of what's happening, I think I can work > around this occasional error by redirecting git's stderr to a file or something like > that, but it's taken us a long time to figure this out, so I wonder if a more permanent > fix shouldn't be implement, so others don't run into the same problem. A google for > "make: write error" indicates that we're not the first to have this problem with > parallel builds, although in the other cases I've found, a specific version of the > Linux kernel was being blamed. Maybe that was a different problem. > > I guess git could workaround this by redirecting stderr, but the real problem is probably > with ssh, although it's not clear to me what it should do differently. It does some > somehow backwards to me that that it only makes a descriptor non-blocking if it doesn't > refer to a TTY, but it does the same thing in at least three different places so I guess > that's not a mistake. Where would git redirect the stderr to? We definitely want to make sure it goes to our original stderr, since it can have useful content for the user to see. We could make a new pipe and then pump the output back to our original stderr. But besides being complex, that fools the downstream programs about whether stderr is a tty (I don't know whether ssh cares, but certainly git itself uses that to decide on some elements of the output, mostly progress meters). So I think it would make more sense to talk to ssh folks about why this momentary O_NONBLOCK setting happens, and if it can be avoided. -Peff