Re: [PATCH 1/1] upload-pack: fix race condition in error messages

Jeff King <peff@xxxxxxxx> · Wed, 4 Sep 2019 01:04:42 -0400

On Fri, Aug 30, 2019 at 02:10:05PM +0200, SZEDER Gábor wrote:

> On Fri, Aug 30, 2019 at 12:06:30AM +0200, SZEDER Gábor wrote:
> > On Thu, Aug 29, 2019 at 11:58:18PM +0200, SZEDER Gábor wrote:
> > > On Thu, Aug 29, 2019 at 10:38:05AM -0400, Jeff King wrote:
> > > > So any fixes there have to happen on the client side. I am still
> > > > confused about why the client is writing in this case, per the argument
> > > > in 014ade7484 (upload-pack: send ERR packet for non-tip objects,
> > > > 2019-04-13). It would be nice to use GIT_TRACE_PACKET to see what it's
> > > > trying to write, but I still haven't been able to reproduce the issue.
> > > 
> > > It's the "done" line:
> > > 
> > >   + cat trace-packet
> [...]
> > >   packet:  upload-pack> 0000
> > >   packet:        fetch> done
> > > 
> > > In the avarage successful run that "fetch> done" pkt-line is
> > > immediately after the "fetch> 0000".

Thanks for all of your persistent digging on this. I had forgotten about
the "done" packet, but it explains all of the symptoms we've seen.

> So instead of immediately die()int after write_in_full() returned an
> error, fetch should first try to read all incoming packets in the hope
> that the remote did send an ERR packet before it died, and then die
> with the error in that packet, or fall back to the current generic
> error message if there is no ERR packet (e.g. remote segfaulted or
> something similarly horrible).  This fixes the test failure with that
> strategically-placed sleep() in 'fetch-pack.c'.
> 
>   https://travis-ci.org/szeder/git/jobs/578778749#L2689
> 
> Alas, passing a 'reader' to a function called send_request() doesn't
> look quite right, does it...  And I'm not sure about the stateless
> communication, it still uses write_or_die().

And thank you for putting this patch together. I had taken a stab at it
a while ago, but got discouraged by figuring out at which layer to add
the "reader" info (I had envisioned it much lower in packet_write(), but
it is clear from your patch that fetch-pack does most of its own
writing).

I agree passing around the reader is a bit weird; I wonder if we should
be representing the full-duplex connection more clearly as a single
struct. But I suspect that creates other headaches, and what you have
here doesn't look _too_ bad. As you note, it probably doesn't cover all
code paths, but it at least fixes some of them, and gives us a template
for addressing the others.

>  	} else {
> -		if (write_in_full(fd, buf->buf, buf->len) < 0)
> +		if (write_in_full(fd, buf->buf, buf->len) < 0) {
> +			int save_errno = errno;
> +			/*
> +			 * Read everything the remote has sent to us.
> +			 * If there is an ERR packet, then the loop die()s
> +			 * with the received error message.
> +			 * If we reach EOF without seeing an ERR, then die()
> +			 * with a generic error message, most likely "Broken
> +			 * pipe".
> +			 */
> +			while (packet_reader_read(reader) != PACKET_READ_EOF);
> +			errno = save_errno;
>  			die_errno(_("unable to write to remote"));
> +		}

One unfortunate thing here is that we could block indefinitely in
packet_reader_read(). That shouldn't happen, I don't think, but since
this is an error case where we've been cutoff, anything's possible.

We maybe could get away with using non-blocking I/O. We're looking for
an ERR packet the other side sent us _before_ it hung up, so in theory
we've have received the data before any FIN packet (or EOF on a pipe).
But I'm wary of introducing new races there.

It might be enough to put in an actual timer, waiting for an ERR packet,
EOF, or something like 5 seconds. Or maybe I'm just being overly
paranoid.

-Peff