Re: t5570-git-daemon fails with SIGPIPE on OSX

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peff,

I just had a look at the patch you provided below (for some reason, my
previous search on public-inbox only turned up Gábor's mail to which you
responded).

Admittedly, I do not really understand all aspects of it, but it applies,
still, and I kicked off a stress test here:

	https://dev.azure.com/git/git/_build/results?buildId=338

It seems that your patch fixes that t5570 flakiness on macOS, and more
importantly, addresses an important issue on macOS.

Will play a bit more with it and keep you posted.

Ciao,
Dscho

On Tue, 14 Aug 2018, Jeff King wrote:

> On Mon, Aug 06, 2018 at 05:11:13PM +0200, SZEDER Gábor wrote:
> 
> >   - 'git upload-pack' receives the request, parses the want line,
> >     notices the corrupt pack, responds with an 'ERR upload-pack: not
> >     our ref' pkt-line, and die()s right away.
> > 
> >   - 'git fetch' finally approaches the end of the function, where it
> >     attempts to send a done pkt-line via another send_request() call
> >     through the now closing TCP socket.
> > 
> >   - What happens now seems to depend on the platform:
> > 
> >     - On Linux, both on my machine and on Travis CI, it shows textbook
> >       example behaviour: write() returns with error and sets errno to
> >       ECONNRESET.  Since it happens in write_or_die(), 'git fetch'
> >       die()s with 'fatal: write error: Connection reset by peer', and
> >       doesn't show the error send by 'git upload-pack'; how could it,
> >       it doesn't even get as far to receive upload-pack's ERR
> >       pkt-line.
> > 
> >       The test only checks that 'git fetch' fails, but it doesn't
> >       check whether it failed with the right error message, so the
> >       test still succeeds.  Had it checked the error message as well,
> >       we most likely had noticed this issue already, it doesn't happen
> >       all that rarely.
> 
> Hmm. Traditionally we did not send ERR as part of upload-pack at all. It
> was the message you got from git-daemon if it couldn't start the
> requested sub-process. It was only later in bdb31eada7 (upload-pack:
> report "not our ref" to client, 2017-02-23) that we started sending
> them. So I think that is why it does not check the error message: it is
> not expecting that case at all (and it is not actually interesting here,
> as the real problem is that the remote side is corrupt, but it sadly
> does not say anything so useful).
> 
> I think that's somewhat tangential, though. The root of the issue is
> this:
> 
> >     - On the new OSX images with XCode 9.4 on Travis CI the write()
> >       triggers SIGPIPE right away, and 'test_must_fail' notices it and
> >       fails the test.  I couldn't see any sign of an ECONNRESET or any
> >       other error that we could act upon to avoid the SIGPIPE.
> 
> Right, as soon as we get SIGPIPE we can't offer any useful message,
> because we're dead. I would argue that fetch should simply turn off
> SIGPIPE entirely, and rely on getting EPIPE from write(). But since
> we're in write_or_die(), it actually turns EPIPE back into a SIGPIPE
> death!
> 
> So we'd probably also want to teach it to use a real write_in_full(),
> and then output a more useful message in this case. write_or_die()
> really does produce bad messages regardless, because it doesn't know
> what it's writing to.
> 
> That would give us a baby step in the right direction, because at least
> we'd always be doing a controlled die() then. And then the next step
> would be to show the remote error message (even though it's not actually
> useful in this case, in theory upload-pack could generate something
> better). And that would mean turning the die() on write into an attempt
> to drain any ERR messages before either dying or returning an error up
> the stack.
> 
> I suspect the (largely untested) patch below would make your test
> problems go away. Or instead, we could simply add sigpipe=ok to the
> test_must_fail invocation, but I agree with you that the current
> behavior on OS X is not ideal (the user sees no error message).
> 
> -Peff
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 5714bcbddd..3e80604562 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -188,8 +188,10 @@ static void send_request(struct fetch_pack_args *args,
>  	if (args->stateless_rpc) {
>  		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
>  		packet_flush(fd);
> -	} else
> -		write_or_die(fd, buf->buf, buf->len);
> +	} else {
> +		if (write_in_full(fd, buf->buf, buf->len) < 0)
> +			die_errno("unable to write to remote");
> +	}
>  }
>  
>  static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
> @@ -1167,7 +1169,8 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
>  
>  	/* Send request */
>  	packet_buf_flush(&req_buf);
> -	write_or_die(fd_out, req_buf.buf, req_buf.len);
> +	if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
> +		die_errno("unable to write request to remote");
>  
>  	strbuf_release(&req_buf);
>  	return ret;
> diff --git a/pkt-line.c b/pkt-line.c
> index a593c08aad..450d0801b1 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -88,13 +88,15 @@ static void packet_trace(const char *buf, unsigned int len, int write)
>  void packet_flush(int fd)
>  {
>  	packet_trace("0000", 4, 1);
> -	write_or_die(fd, "0000", 4);
> +	if (write_in_full(fd, "0000", 4) < 0)
> +		die_errno("unable to write flush packet");
>  }
>  
>  void packet_delim(int fd)
>  {
>  	packet_trace("0001", 4, 1);
> -	write_or_die(fd, "0001", 4);
> +	if (write_in_full(fd, "0000", 4) < 0)
> +		die_errno("unable to write delim packet");
>  }
>  
>  int packet_flush_gently(int fd)
> 
> 

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux