On Wed, Sep 04, 2019 at 01:04:42AM -0400, Jeff King wrote: > On Fri, Aug 30, 2019 at 02:10:05PM +0200, SZEDER Gábor wrote: > > > On Fri, Aug 30, 2019 at 12:06:30AM +0200, SZEDER Gábor wrote: > > > On Thu, Aug 29, 2019 at 11:58:18PM +0200, SZEDER Gábor wrote: > > > > On Thu, Aug 29, 2019 at 10:38:05AM -0400, Jeff King wrote: > > > > > So any fixes there have to happen on the client side. I am still > > > > > confused about why the client is writing in this case, per the argument > > > > > in 014ade7484 (upload-pack: send ERR packet for non-tip objects, > > > > > 2019-04-13). It would be nice to use GIT_TRACE_PACKET to see what it's > > > > > trying to write, but I still haven't been able to reproduce the issue. > > > > > > > > It's the "done" line: > > > > > > > > + cat trace-packet > > [...] > > > > packet: upload-pack> 0000 > > > > packet: fetch> done > > > > > > > > In the avarage successful run that "fetch> done" pkt-line is > > > > immediately after the "fetch> 0000". > > Thanks for all of your persistent digging on this. Yeah, I can be easily distracted by an interesting looking bug... was told it's a character flaw ;) > I had forgotten about > the "done" packet, but it explains all of the symptoms we've seen. > > > So instead of immediately die()int after write_in_full() returned an > > error, fetch should first try to read all incoming packets in the hope > > that the remote did send an ERR packet before it died, and then die > > with the error in that packet, or fall back to the current generic > > error message if there is no ERR packet (e.g. remote segfaulted or > > something similarly horrible). This fixes the test failure with that > > strategically-placed sleep() in 'fetch-pack.c'. > > > > https://travis-ci.org/szeder/git/jobs/578778749#L2689 > > > > Alas, passing a 'reader' to a function called send_request() doesn't > > look quite right, does it... And I'm not sure about the stateless > > communication, it still uses write_or_die(). > > And thank you for putting this patch together. I had taken a stab at it > a while ago, but got discouraged by figuring out at which layer to add > the "reader" info (I had envisioned it much lower in packet_write(), but > it is clear from your patch that fetch-pack does most of its own > writing). > > I agree passing around the reader is a bit weird; I considered renaming send_request() so that 'reader' won't look that out of place among its parameters, but all my ideas were ridiculous, e.g. send_request_and_process_ERR_pkt_on_error()... > I wonder if we should > be representing the full-duplex connection more clearly as a single > struct. But I suspect that creates other headaches, and what you have > here doesn't look _too_ bad. As you note, it probably doesn't cover all > code paths, but it at least fixes some of them, and gives us a template > for addressing the others. > > > } else { > > - if (write_in_full(fd, buf->buf, buf->len) < 0) > > + if (write_in_full(fd, buf->buf, buf->len) < 0) { > > + int save_errno = errno; > > + /* > > + * Read everything the remote has sent to us. > > + * If there is an ERR packet, then the loop die()s > > + * with the received error message. > > + * If we reach EOF without seeing an ERR, then die() > > + * with a generic error message, most likely "Broken > > + * pipe". > > + */ > > + while (packet_reader_read(reader) != PACKET_READ_EOF); > > + errno = save_errno; > > die_errno(_("unable to write to remote")); > > + } > > One unfortunate thing here is that we could block indefinitely in > packet_reader_read(). That shouldn't happen, I don't think, but since > this is an error case where we've been cutoff, anything's possible. Yeah, when we use different file descriptors for reading and writing, then any error on the writing fd doesn't necessarily mean that there is on an error on the reading fd as well. I mean, we could get an EBADF or EFAULT as well, but those would rather indicate a bug in Git than an error with the connection itself. I wondered whether we could avoid blocking indefinitely by looking for an ERR packet only if the write() resulted in ECONNRESET or EPIPE, i.e. that indicate a connection error. I suppose 'git upload-pack' (or an alternative implementation) could be buggy and could inadvertently close() the other end of the fd that fetch-pack writes to but not the fd where it reads from, so the write() would get ECONNRESET, but then packet_reader_read() could still hang on the still open read fd. I'm not sure whether it's worth worrying about; I mean a buggy 'git upload-pack' can do all kinds of weird things that would lead to a hang. Anyway, after write_in_full() returns with error we could first print an error message with error_errno() and then go on to look for an ERR packet. So even if packet_reader_read() hangs, at least there will be an error message for the user to see. It wouldn't help automation, though. > We maybe could get away with using non-blocking I/O. We're looking for > an ERR packet the other side sent us _before_ it hung up, so in theory > we've have received the data before any FIN packet (or EOF on a pipe). > But I'm wary of introducing new races there. > > It might be enough to put in an actual timer, waiting for an ERR packet, > EOF, or something like 5 seconds. Or maybe I'm just being overly > paranoid. I think the timeout would be the safest bet, but then we would have to pass that timeout parameter through a couple of a function calls...