On Fri, Aug 30, 2019 at 02:10:05PM +0200, SZEDER Gábor wrote: > On Fri, Aug 30, 2019 at 12:06:30AM +0200, SZEDER Gábor wrote: > > On Thu, Aug 29, 2019 at 11:58:18PM +0200, SZEDER Gábor wrote: > > > On Thu, Aug 29, 2019 at 10:38:05AM -0400, Jeff King wrote: > > > > So any fixes there have to happen on the client side. I am still > > > > confused about why the client is writing in this case, per the argument > > > > in 014ade7484 (upload-pack: send ERR packet for non-tip objects, > > > > 2019-04-13). It would be nice to use GIT_TRACE_PACKET to see what it's > > > > trying to write, but I still haven't been able to reproduce the issue. > > > > > > It's the "done" line: > > > > > > + cat trace-packet > [...] > > > packet: upload-pack> 0000 > > > packet: fetch> done > > > > > > In the avarage successful run that "fetch> done" pkt-line is > > > immediately after the "fetch> 0000". Thanks for all of your persistent digging on this. I had forgotten about the "done" packet, but it explains all of the symptoms we've seen. > So instead of immediately die()int after write_in_full() returned an > error, fetch should first try to read all incoming packets in the hope > that the remote did send an ERR packet before it died, and then die > with the error in that packet, or fall back to the current generic > error message if there is no ERR packet (e.g. remote segfaulted or > something similarly horrible). This fixes the test failure with that > strategically-placed sleep() in 'fetch-pack.c'. > > https://travis-ci.org/szeder/git/jobs/578778749#L2689 > > Alas, passing a 'reader' to a function called send_request() doesn't > look quite right, does it... And I'm not sure about the stateless > communication, it still uses write_or_die(). And thank you for putting this patch together. I had taken a stab at it a while ago, but got discouraged by figuring out at which layer to add the "reader" info (I had envisioned it much lower in packet_write(), but it is clear from your patch that fetch-pack does most of its own writing). I agree passing around the reader is a bit weird; I wonder if we should be representing the full-duplex connection more clearly as a single struct. But I suspect that creates other headaches, and what you have here doesn't look _too_ bad. As you note, it probably doesn't cover all code paths, but it at least fixes some of them, and gives us a template for addressing the others. > } else { > - if (write_in_full(fd, buf->buf, buf->len) < 0) > + if (write_in_full(fd, buf->buf, buf->len) < 0) { > + int save_errno = errno; > + /* > + * Read everything the remote has sent to us. > + * If there is an ERR packet, then the loop die()s > + * with the received error message. > + * If we reach EOF without seeing an ERR, then die() > + * with a generic error message, most likely "Broken > + * pipe". > + */ > + while (packet_reader_read(reader) != PACKET_READ_EOF); > + errno = save_errno; > die_errno(_("unable to write to remote")); > + } One unfortunate thing here is that we could block indefinitely in packet_reader_read(). That shouldn't happen, I don't think, but since this is an error case where we've been cutoff, anything's possible. We maybe could get away with using non-blocking I/O. We're looking for an ERR packet the other side sent us _before_ it hung up, so in theory we've have received the data before any FIN packet (or EOF on a pipe). But I'm wary of introducing new races there. It might be enough to put in an actual timer, waiting for an ERR packet, EOF, or something like 5 seconds. Or maybe I'm just being overly paranoid. -Peff