Re: [PATCH v3 2/2] transport-helper: check if remote helper is alive

Jeff King <peff@xxxxxxxx> · Sun, 7 Apr 2013 20:51:55 -0400

On Sun, Apr 07, 2013 at 01:45:06AM -0600, Felipe Contreras wrote:

> diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
> index f387027..efe67e2 100755
> --- a/t/t5801-remote-helpers.sh
> +++ b/t/t5801-remote-helpers.sh
> @@ -166,4 +166,23 @@ test_expect_success 'push ref with existing object' '
>  	compare_refs local dup server dup
>  '
>  
> +test_expect_success 'proper failure checks for fetching' '
> +	(GIT_REMOTE_TESTGIT_FAILURE=1 &&
> +	export GIT_REMOTE_TESTGIT_FAILURE &&
> +	cd local &&
> +	test_must_fail git fetch 2> error &&
> +	grep "Error while running helper" error
> +	)
> +'

Hmm.  If you drop the final "grep" (which is looking for the error
message you add elsewhere in this patch), this test passes even without
the addition of the check_command calls added by your patch. Which made
me wonder if we should be checking something else (repo state, error
messages, etc), since we seem to otherwise be detecting the error. More
analysis below on exactly what is going on there.

> +# We sleep to give fast-export a chance to catch the SIGPIPE

I'm not sure what this means; I don't see a sleep anywhere.

> +test_expect_failure 'proper failure checks for pushing' '
> +	(GIT_REMOTE_TESTGIT_FAILURE=1 &&
> +	export GIT_REMOTE_TESTGIT_FAILURE &&
> +	cd local &&
> +	test_must_fail git push --all 2> error &&
> +	grep "Error while running helper" error
> +	)
> +'

I wondered why this one is marked for failure.

Even without check_command, the push produces:

  error: fast-export died of signal 13
  fatal: Error while running fast-export

which explains why the test fails: it does not even make it to the
check_command call at all. Why do we need check_command here, then? Is
it racy/non-deterministic for fast-export to die due to the pipe?

> diff --git a/transport-helper.c b/transport-helper.c
> index cb3ef7d..dfdfa7a 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -460,6 +460,10 @@ static int fetch_with_import(struct transport *transport,
>  
>  	if (finish_command(&fastimport))
>  		die("Error while running fast-import");
> +
> +	if (!check_command(data->helper))
> +		die("Error while running helper");
> +
>  	argv_array_free_detached(fastimport.argv);

I'm still not very excited about this approach, as it is racy to detect
errors. E.g., there is nothing to say that the helper does not close the
pipe to fast-import prematurely and then die afterwards, leaving the
refs unwritten, fast-import happy, but a failed exit code from the
helper.

The import test still passes even without the check_command part of your
patch because some of the refs in refs/testgit do not exist prior to the
failed fetch, and therefore also do not exist afterwards. When fetch
tries to feed their sha1s to check_everything_connected, you get the
funny:

  fatal: bad object 0000000000000000000000000000000000000000
  error: testgit::/home/peff/compile/git/t/trash
   directory.t5801-remote-helpers/server did not send all necessary
   objects

But if we change the test like this:

diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index efe67e2..e968ea9 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -167,11 +167,11 @@ test_expect_success 'proper failure checks for fetching' '
 '
 
 test_expect_success 'proper failure checks for fetching' '
-	(GIT_REMOTE_TESTGIT_FAILURE=1 &&
+	(cd local &&
+	git fetch &&
+	GIT_REMOTE_TESTGIT_FAILURE=1 &&
 	export GIT_REMOTE_TESTGIT_FAILURE &&
-	cd local &&
-	test_must_fail git fetch 2> error &&
-	grep "Error while running helper" error
+	test_must_fail git fetch 2> error
 	)
 '
 
we can see that without your check_command, the failure in the second
fetch is not noticed. Adding in your patch does detect this. _But_ it is
only checking a specific failure mode of the remote-helper: process
death that results in closing the fast-import pipe, which is how your
GIT_REMOTE_TESTGIT_FAILURE is implemented (closing the pipe first and
then dying is racy).

If we then add this on top of your series (fixing the "status" bug in
patch 1 that Junio mentioned), the test will start failing (both the
original, and the more robust one I showed above):

diff --git a/git-remote-testgit b/git-remote-testgit
index ca0cf09..41b0780 100755
--- a/git-remote-testgit
+++ b/git-remote-testgit
@@ -64,6 +64,11 @@ do
 
 		if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
 		then
+			# close stdout
+			exec >/dev/null
+			# now sleep to make sure fast-import
+			# has time to die before we exit
+			sleep 1
 			exit -1
 		fi
 

I agree that the failure mode from your patch is probably the most
common order for helpers to fail in (i.e., they just die and that's what
kills the pipe).  But I wonder if we can just cover all cases.
Something like:

diff --git a/transport-helper.c b/transport-helper.c
index dfdfa7a..8562df0 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -461,7 +461,31 @@ static int fetch_with_import(struct transport *transport,
 	if (finish_command(&fastimport))
 		die("Error while running fast-import");
 
-	if (!check_command(data->helper))
+	/*
+	 * We must disconnect from the helper at this point, because even
+	 * though fast-import may have succeeded, it may only be because the
+	 * helper was not able to feed fast-import all of the data, and what
+	 * fast-import got looked OK (e.g., it may have got nothing if the
+	 * helper died early). We still need to check the return code of the
+	 * helper to make sure it is happy with what it sent.
+	 *
+	 * Since the import command does not require the helper to ever report
+	 * success/failure of the import, we have no mechanism to check for
+	 * problems except to check its exit status.
+	 *
+	 * Callers of the transport code are allowed to make more requests
+	 * of our helper, so we may be disconnecting before they expect in that
+	 * case. However:
+	 *
+	 *   1. Current callers don't do that; after fetching refs, there
+	 *      is nothing left for the helper to do.
+	 *
+	 *   2. We transparently start the helper as necessary, so if we were
+	 *      to get another request (e.g., to import more refs), we would
+	 *      simply start a new instantiation of the helper.
+	 *
+	 */
+	if (disconnect_helper(transport) != 0)
 		die("Error while running helper");
 
 	argv_array_free_detached(fastimport.argv);

which passes both your original test and the more strict one above. Of
course adding a done-import capability would be nice to fix the protocol
deficiency, but it is more code.

> @@ -818,6 +822,10 @@ static int push_refs_with_export(struct transport *transport,
>  
>       if (finish_command(&exporter))
>               die("Error while running fast-export");
> +
> +     if (!check_command(data->helper))
> +             die("Error while running helper");
> +
>       push_update_refs_status(data, remote_refs);
>       return 0;
>  }

And this one is even more likely to race than the import case, I think,
as the exporter may send all of its data, exit, and then the helper
chugs on it for a bit (converting, sending across the network, etc). If
it is still chugging when we run check_command, we will not notice
anything here.

E.g., if we instrument the failure like this:

diff --git a/git-remote-testgit b/git-remote-testgit
index ca0cf09..a912bc1 100755
--- a/git-remote-testgit
+++ b/git-remote-testgit
@@ -75,6 +75,12 @@ do
 	export)
 		if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
 		then
+			# imagine we read all of fast-export's data
+			# first, and then die while trying to convert
+			# it
+			while read line; do
+				test "$line" = "done" && break
+			done
 			exit -1
 		fi
 

we do not get the sigpipe from fast-export, and depending on the timing,
your check_command may or may not do anything (in my tests, it did not).

But unlike the import side, the export command _does_ give us a status
report back from the helper: it prints an ok/error line for each ref
that was exported, followed by a blank line. So we should be able to
confirm that we get the blank line at all, and then that each ref was
present, which would determine whether the export failed or not without
being subject to race conditions.

And we seem to do those checks already; the only problem I see is that
recvline does not print a message before dying. Would the patch below be
sufficient?

diff --git a/transport-helper.c b/transport-helper.c
index dfdfa7a..cce2062 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -54,7 +54,7 @@ static int recvline_fh(FILE *helper, struct strbuf *buffer)
 	if (strbuf_getline(buffer, helper, '\n') == EOF) {
 		if (debug)
 			fprintf(stderr, "Debug: Remote helper quit.\n");
-		exit(128);
+		die("remote helper died unexpectedly");
 	}
 
 	if (debug)

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html