Re: [RFC PATCH] fetch-pack: try harder to read an ERR packet

Taylor Blau <me@xxxxxxxxxxxx> · Wed, 22 Apr 2020 17:27:44 -0600

On Wed, Apr 22, 2020 at 06:33:57PM +0200, Christian Couder wrote:

It looks like this may be missing a:

  From: SZEDER Gábor <szeder.dev@xxxxxxxxx>

header.

> When the server has hung up after sending an ERR packet to the
> client, the client might still be writing, for example a "done"
> line. Therefore the client might get a write error before reading
> the ERR packet.
>
> When fetching this could result in the client displaying a
> "Broken pipe" error, instead of the more useful error sent by
> the server in the ERR packet.

All makes sense.

> Instead of immediately die()ing after write_in_full() returned an
> error, fetch should first try to read all incoming packets in the hope
> that the remote did send an ERR packet before it died, and then die
> with the error in that packet, or fall back to the current generic
> error message if there is no ERR packet (e.g. remote segfaulted or
> something similarly horrible).
>
> Signed-off-by: SZEDER Gábor <szeder.dev@xxxxxxxxx>
> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> ---
> This just formats the following patch from SZEDER Gábor:
>
> https://lore.kernel.org/git/20190830121005.GI8571@xxxxxxxxxx/
>
> I haven't tried to implement some suggestions discussed later
> in the above thread like:
>
>   - renaming send_request()

Agreed that this is probably something we should do. Maybe
'send_request_retry' or something? That name is kind of terrible.

>   - covering more code pathes

I see where Peff raised this point originally, but I think that this is
a good step forward as-is. No need to hold this up looking for complete
coverage, since this is obviously improving the situation.

>   - avoid blocking indefinitely by looking for an ERR packet
>     only if the write() resulted in ECONNRESET or EPIPE

I think that I may have addressed this point below.

>   - first printing an error message with error_errno() before
>     going on to look for an ERR packet

I'm not sure what I think about this one. I'd certainly welcome all
opinions on this matter.

>   - implementing a timeout

A timeout may be a good thing to do. See what you think about my
suggestion below, first, though.

>
> as it seems to me that there was no consensus about those.
>
> It follows up from discussions in this thread:
>
> https://lore.kernel.org/git/cover.1584477196.git.me@xxxxxxxxxxxx/
>
>  fetch-pack.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 1734a573b0..63e8925e2b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -185,14 +185,27 @@ static enum ack_type get_ack(struct packet_reader *reader,
>  }
>
>  static void send_request(struct fetch_pack_args *args,
> -			 int fd, struct strbuf *buf)
> +			 int fd, struct strbuf *buf,
> +			 struct packet_reader *reader)
>  {
>  	if (args->stateless_rpc) {
>  		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
>  		packet_flush(fd);
>  	} else {
> -		if (write_in_full(fd, buf->buf, buf->len) < 0)
> +		if (write_in_full(fd, buf->buf, buf->len) < 0) {

This makes sense; if 'write_in_full' fails, we want to go into the
error-handling case below.

But, I wonder if we could do better than re-using 'write_in_full' here.
We definitely do want to write 'buf->len' bytes overall, but what
happens when a 'write()' fails? I think it's at _that_ point we want to
try and read a packet or two off from the remote.

What if instead this looked something like:

  const char *p = buf;
  ssize_t total = 0;

  while (count > 0) {
    ssize_t written = xwrite(fd, p, count);
    if (written < 0)
      return -1;
    /* note the change on this line */
    if (!written && packet_reader_read(reader) == PACKET_READ_EOF) {
      errno = ENOSPC;
      return -1;
    }
    count -= written;
    p += written;
    total += written;
  }

  return total;

That is basically the definition of 'write_in_full', but when we didn't
get a chance to write anything, then we try to read one packet.

This way, we only read exactly as many packets as we need to when we hit
this case. I'm not sure that it matters in practice, though.

> +			int save_errno = errno;
> +			/*
> +			 * Read everything the remote has sent to us.
> +			 * If there is an ERR packet, then the loop die()s
> +			 * with the received error message.
> +			 * If we reach EOF without seeing an ERR, then die()
> +			 * with a generic error message, most likely "Broken
> +			 * pipe".
> +			 */
> +			while (packet_reader_read(reader) != PACKET_READ_EOF);
> +			errno = save_errno;
>  			die_errno(_("unable to write to remote"));
> +		}
>  	}
>  }
>
> @@ -349,7 +362,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  		const char *arg;
>  		struct object_id oid;
>
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>  		while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
>  			if (skip_prefix(reader.line, "shallow ", &arg)) {
>  				if (get_oid_hex(arg, &oid))
> @@ -372,7 +385,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  			die(_("expected shallow/unshallow, got %s"), reader.line);
>  		}
>  	} else if (!args->stateless_rpc)
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>
>  	if (!args->stateless_rpc) {
>  		/* If we aren't using the stateless-rpc interface
> @@ -395,7 +408,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  			int ack;
>
>  			packet_buf_flush(&req_buf);
> -			send_request(args, fd[1], &req_buf);
> +			send_request(args, fd[1], &req_buf, &reader);
>  			strbuf_setlen(&req_buf, state_len);
>  			flushes++;
>  			flush_at = next_flush(args->stateless_rpc, count);
> @@ -470,7 +483,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  	trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
>  	if (!got_ready || !no_done) {
>  		packet_buf_write(&req_buf, "done\n");
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>  	}
>  	print_verbose(args, _("done"));
>  	if (retval != 0) {
> --
> 2.17.1

Thanks,
Taylor