[PATCH 1/5] send-pack: close demux pipe before finishing async process

Jeff King <peff@xxxxxxxx> · Tue, 19 Apr 2016 18:45:17 -0400

This fixes a deadlock on the client side when pushing a
large number of refs from a corrupted repo.  There's a
reproduction script below, but let's start with a
human-readable explanation.

The client side of a push goes something like this:

  1. Start an async process to demux sideband coming from
     the server.

  2. Run pack-objects to send the actual pack, and wait for
     its status via finish_command().

  3. If pack-objects failed, abort immediately.

  4. If pack-objects succeeded, read the per-ref status from
     the server, which is actually coming over a pipe from
     the demux process started in step 1.

We run finish_async() to wait for and clean up the demux
process in two places. In step 3, if we see an error, we
want it to end early. And after step 4, it should be done
writing any data and we are just cleaning it up.

Let's focus on the error case first. We hand the output
descriptor to the server over to pack-objects. So by the
time it has returned an error to us, it has closed the
descriptor and the server has gotten EOF. The server will
mark all refs as failed with "unpacker error" and send us
back the status for each (followed by EOF).

This status goes to the demuxer thread, which relays it over
a pipe to the main thread. But the main thread never even
tries reading the status. It's trying to bail because of the
pack-objects error, and is waiting for the demuxer thread to
finish. If there are a small number of refs, that's OK; the
demuxer thread writes into the pipe buffer, sees EOF from
the server, and quits. But if there are a large number of
refs, it may block on write() back to the main thread,
leading to a deadlock (the main thread is waiting for the
demuxer to finish, the demuxer is waiting for the main
thread to read).

We can break this deadlock by closing the pipe between the
demuxer and the main thread before calling finish_async().
Then the demuxer gets a write() error and exits.

The non-error case usually just works, because we will have
read all of the data from the other side. We do close
demux.out already, but we only do so _after_ calling
finish_async(). This is OK because there shouldn't be any
more data coming from the server. But technically we've only
read to a flush packet, and a broken or malicious server
could be sending more cruft. In such a case, we would hit
the same deadlock. Closing the pipe first doesn't affect the
normal case, and means that for a cruft-sending server,
we'll notice a write() error rather than deadlocking.

Note that when write() sees this error, we'll actually
deliver SIGPIPE to the thread, which will take down the
whole process (unless we're compiled with NO_PTHREADS). This
isn't ideal, but it's an improvement over the status quo,
which is deadlocking. And SIGPIPE handling in async threads
is a bigger problem that we can deal with separately.

A simple reproduction for the error case is below. It's
technically racy (we could exit the main process and take
down the async thread with us before it even reads the
status), though in practice it seems to fail pretty
consistently.

    git init repo &&
    cd repo &&

    # make some commits; we need two so we can simulate corruption
    # in the history later.
    git commit --allow-empty -m one &&
    one=$(git rev-parse HEAD) &&
    git commit --allow-empty -m two &&
    two=$(git rev-parse HEAD) &&

    # now make a ton of refs; our goal here is to overflow the pipe buffer
    # when reporting the ref status, which will cause the demuxer to block
    # on write()
    for i in $(seq 20000); do
    	echo "create refs/heads/this-is-a-really-long-branch-name-$i $two"
    done |
    git update-ref --stdin &&

    # now make a corruption in the history such that pack-objects will fail
    rm -vf .git/objects/$(echo $one | sed 's}..}&/}') &&

    # and then push the result
    git init --bare dst.git &&
    git push --mirror dst.git

Signed-off-by: Jeff King <peff@xxxxxxxx>
---
Even though the SIGPIPE isn't great, it's certainly better
than a deadlock.  So if we had to take just this one for
"maint" (e.g., because the SIGPIPE stuff causes problems on
Windows), I think it's still worth doing (and is why I stuck
this patch at the front of the series).

One alternative would be to add async_cancel() which would
run pthread_cancel() for the threaded case, and kill() for
the non-threaded. That helps the error case, but makes the
non-error case a bit weird (if the thread _isn't_ blocking
on write to us, we want to get the real status code, rather
than canceling it. But we don't have any way of knowing
which of _cancel() or finish() to call).

 send-pack.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/send-pack.c b/send-pack.c
index 047bd18..fc27db6 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -531,8 +531,10 @@ int send_pack(struct send_pack_args *args,
 				close(out);
 			if (git_connection_is_socket(conn))
 				shutdown(fd[0], SHUT_WR);
-			if (use_sideband)
+			if (use_sideband) {
+				close(demux.out);
 				finish_async(&demux);
+			}
 			fd[1] = -1;
 			return -1;
 		}
@@ -551,11 +553,11 @@ int send_pack(struct send_pack_args *args,
 		packet_flush(out);
 
 	if (use_sideband && cmds_sent) {
+		close(demux.out);
 		if (finish_async(&demux)) {
 			error("error in sideband demultiplexer");
 			ret = -1;
 		}
-		close(demux.out);
 	}
 
 	if (ret < 0)
-- 
2.8.1.512.g4e0a533

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html