Re: [PATCH 1/1] upload-pack: fix race condition in error messages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 28, 2019 at 11:34:08AM +0200, SZEDER Gábor wrote:

> On Tue, Aug 27, 2019 at 06:43:29PM -0700, Derrick Stolee via GitGitGadget wrote:
> > Test t5516-fetch-push.sh has a test 'deny fetch unreachable SHA1,
> > allowtipsha1inwant=true' that checks stderr for a specific error
> > string from the remote. In some build environments the error sent
> > over the remote connection gets mingled with the error from the
> > die() statement. Since both signals are being output to the same
> > file descriptor (but from parent and child processes), the output
> > we are matching with grep gets split.
> 
> In the spirit of "An error message is worth a thousand words", I think
> it's worth to include the error message causing the failure:
> 
>   error: 'grep not our ref.*64ea4c133d59fa98e86a771eda009872d6ab2886 err' didn't find a match in:
>   fatal: git upload-pack: not our ref 64ea4c13fatal: remote error: upload-pack: not our ref 63d59fa98e86a771eda009872d6ab2886
>   4ea4c133d59fa98e86a771eda009872d6ab2886
>   error: last command exited with $?=1
> 
> I tried to reproduce this specific error on Linux and macOS, but
> couldn't yet.

I suspect it depends on write() to a file not being atomic, since we
should be able to get the full message out to a single write in both
cases. It's _possible_ that stderr is fully buffered on Windows, but it
generally shouldn't be. If it is, then this might help:

diff --git a/usage.c b/usage.c
index 2fdb20086b..d6df31bc5b 100644
--- a/usage.c
+++ b/usage.c
@@ -10,13 +10,19 @@ void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	char *p;
-
-	vsnprintf(msg, sizeof(msg), err, params);
+	size_t len;
+
+	/* truncation is OK here, but make sure we have a newline */
+	len = xsnprintf(msg, sizeof(msg), "%s", prefix);
+	len += vsnprintf(msg + len, sizeof(msg) - len, err, params);
+	if (len >= sizeof(msg) - 1)
+		len--;
+	len += xsnprintf(msg + len, sizeof(msg) - len, "\n");
 	for (p = msg; *p; p++) {
 		if (iscntrl(*p) && *p != '\t' && *p != '\n')
 			*p = '?';
 	}
-	fprintf(stderr, "%s%s\n", prefix, msg);
+	write(2, msg, len);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)

But again, I'm doubtful.

> Here you talk about reducing the risk ...
> 
> > 1. Write an error message to stderr.
> > 2. Write an error message across the connection.
> > 3. exit(1).
> > 
> > This reorders the events so the error is written entirely before
> > the client receives a message from the remote, removing the race
> > condition.
> 
> ... but here you talk about removing the race condition.
> 
> I don't understand how this change would remove the race condition.
> After all, the occasional failure is caused by two messages racing
> through different file descriptors, and merely sending them in reverse
> order doesn't change that they would still be racing.

I had the same puzzlement. I think the answer might be that in the
original, we can have two write()s happening simultaneously:

  1. upload-pack sends the packet to the client

  2. client receives packet

  3a. upload-pack then writes to stderr

  3b. simultaneously, the client writes to stderr

But by reordering, step 3a is completed before step 1.

> > +			warning("git upload-pack: not our ref %s",
> > +				oid_to_hex(&o->oid));
> >  			packet_writer_error(writer,
> >  					    "upload-pack: not our ref %s",
> >  					    oid_to_hex(&o->oid));
> > -			die("git upload-pack: not our ref %s",
> > -			    oid_to_hex(&o->oid));
> > +			exit(1);
> 
> 
> So, the error coming from the 'git fetch' command in question
> currently looks like this:
> 
>   fatal: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
>   fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
> 
> Changing die() to a warning() changes the prefix of the message from
> "fatal:" to "warning:", i.e. with this patch I got this:
> 
>   warning: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
>   fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
> 
> I don't think that "demoting" that message from fatal to warning
> matters much to users, because they would see the (arguably redundant)
> second line starting with "fatal:".

An interesting thing about this message is that many users won't see it
at all! It requires that they be connected to the stderr of upload-pack,
which happens only for local-filesystem clones, and for git-over-ssh to
a vanilla ssh session.

It won't be seen for git:// nor for https://, which go to the stderr of
git-daemon and the webserver respectively. Nor would it be seen on any
hosting site which uses something other than vanilla sshd to terminate
the connection (at GitHub for example, ssh terminates in a proxy layer
which then uses a different protocol to run upload-pack on the backend).

That's why we've been adding this sideband-level errors: so everybody
can see them. So unless the client is a 2005-era version of Git that
lacks sideband, the die message is either invisible to the user, or
redundant.

I'd almost suggest it is worth dropping the die() message altogether
here, but it does carry some value to people collecting them on the
server side[1].

Another solution is for Git to redirect the stderr of the child
upload-pack it runs, which puts all invocations on the same footing,
regardless of protocol. But that risks losing useful messages from
before the sideband starts up (e.g., ssh failures).

[1] At GitHub, we record the exit status of every Git process, and we've
    hooked die() to record the message, which is often helpful for
    post-facto debugging. I'd hate to lose it, but we could certainly
    work around it for this case, perhaps by hooking
    packet_writer_error() in a similar way.

> Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79
> --stress' to try to reproduce a failure caused by those mingled
> messages, the same check only failed for a different reason so far
> (both on Linux and macOS (on Travis CI)):

There's some hand-waving argument that this should be race-free in
014ade7484 (upload-pack: send ERR packet for non-tip objects,
2019-04-13), but I am not too surprised if there is a flaw in that
logic.

This kind of race is a problem anytime upload-pack dies. And it's not
just a test issue, but a real problem for users: they might see EPIPE
instead of the error message from the server (it used to be SIGPIPE but
we recently relaxed that).

There's details and a possible path forward discussed in:

  https://public-inbox.org/git/20190305041139.GA19800@xxxxxxxxxxxxxxxxxxxxx/

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux