On Sun, Apr 7, 2013 at 7:51 PM, Jeff King <peff@xxxxxxxx> wrote: > On Sun, Apr 07, 2013 at 01:45:06AM -0600, Felipe Contreras wrote: > >> diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh >> index f387027..efe67e2 100755 >> --- a/t/t5801-remote-helpers.sh >> +++ b/t/t5801-remote-helpers.sh >> @@ -166,4 +166,23 @@ test_expect_success 'push ref with existing object' ' >> compare_refs local dup server dup >> ' >> >> +test_expect_success 'proper failure checks for fetching' ' >> + (GIT_REMOTE_TESTGIT_FAILURE=1 && >> + export GIT_REMOTE_TESTGIT_FAILURE && >> + cd local && >> + test_must_fail git fetch 2> error && >> + grep "Error while running helper" error >> + ) >> +' > > Hmm. If you drop the final "grep" (which is looking for the error > message you add elsewhere in this patch), this test passes even without > the addition of the check_command calls added by your patch. Which made > me wonder if we should be checking something else (repo state, error > messages, etc), since we seem to otherwise be detecting the error. More > analysis below on exactly what is going on there. It fails all right, because the fetch process assumes the remote-helper succeeded and only later eventually fails with an error that is anything but user-friendly: fatal: bad object 0000000000000000000000000000000000000000 error: testgit::/home/felipec/dev/git/t/trash directory.t5801-remote-helpers/server did not send all necessary objects >> +# We sleep to give fast-export a chance to catch the SIGPIPE > > I'm not sure what this means; I don't see a sleep anywhere. That's right, I'll remove the comment. It was cruft from the next patch (see below). >> +test_expect_failure 'proper failure checks for pushing' ' >> + (GIT_REMOTE_TESTGIT_FAILURE=1 && >> + export GIT_REMOTE_TESTGIT_FAILURE && >> + cd local && >> + test_must_fail git push --all 2> error && >> + grep "Error while running helper" error >> + ) >> +' > > I wondered why this one is marked for failure. Because it fails to report the right error. > Even without check_command, the push produces: > > error: fast-export died of signal 13 > fatal: Error while running fast-export > > which explains why the test fails: it does not even make it to the > check_command call at all. Why do we need check_command here, then? Is > it racy/non-deterministic for fast-export to die due to the pipe? It does not make it to check_command because the wait is missing. I added that in a separate patch: http://article.gmane.org/gmane.comp.version-control.git/219715 With that patch the check_command happens reliably, the error is reported correctly, and the test passes. However, the way to trigger it is ugly. In a real use-case the chances of check_command catching the issue are much higher.. >> diff --git a/transport-helper.c b/transport-helper.c >> index cb3ef7d..dfdfa7a 100644 >> --- a/transport-helper.c >> +++ b/transport-helper.c >> @@ -460,6 +460,10 @@ static int fetch_with_import(struct transport *transport, >> >> if (finish_command(&fastimport)) >> die("Error while running fast-import"); >> + >> + if (!check_command(data->helper)) >> + die("Error while running helper"); >> + >> argv_array_free_detached(fastimport.argv); > > I'm still not very excited about this approach, as it is racy to detect > errors. E.g., there is nothing to say that the helper does not close the > pipe to fast-import prematurely and then die afterwards, leaving the > refs unwritten, fast-import happy, but a failed exit code from the > helper. That's right, and that would leave us in the situation we are right now, even with "done" check. > The import test still passes even without the check_command part of your > patch because some of the refs in refs/testgit do not exist prior to the > failed fetch, and therefore also do not exist afterwards. When fetch > tries to feed their sha1s to check_everything_connected, you get the > funny: > > fatal: bad object 0000000000000000000000000000000000000000 > error: testgit::/home/peff/compile/git/t/trash > directory.t5801-remote-helpers/server did not send all necessary > objects And you think that is desirable? User-friendly? > we can see that without your check_command, the failure in the second > fetch is not noticed. Adding in your patch does detect this. _But_ it is > only checking a specific failure mode of the remote-helper: process > death that results in closing the fast-import pipe, which is how your > GIT_REMOTE_TESTGIT_FAILURE is implemented (closing the pipe first and > then dying is racy). > > If we then add this on top of your series (fixing the "status" bug in > patch 1 that Junio mentioned), the test will start failing (both the > original, and the more robust one I showed above): > > diff --git a/git-remote-testgit b/git-remote-testgit > index ca0cf09..41b0780 100755 > --- a/git-remote-testgit > +++ b/git-remote-testgit > @@ -64,6 +64,11 @@ do > > if test -n "$GIT_REMOTE_TESTGIT_FAILURE" > then > + # close stdout > + exec >/dev/null > + # now sleep to make sure fast-import > + # has time to die before we exit > + sleep 1 > exit -1 > fi Yeah, again: http://article.gmane.org/gmane.comp.version-control.git/219715 > I agree that the failure mode from your patch is probably the most > common order for helpers to fail in (i.e., they just die and that's what > kills the pipe). But I wonder if we can just cover all cases. > Something like: > > diff --git a/transport-helper.c b/transport-helper.c > index dfdfa7a..8562df0 100644 > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -461,7 +461,31 @@ static int fetch_with_import(struct transport *transport, > if (finish_command(&fastimport)) > die("Error while running fast-import"); > > - if (!check_command(data->helper)) > + /* > + * We must disconnect from the helper at this point, because even > + * though fast-import may have succeeded, it may only be because the > + * helper was not able to feed fast-import all of the data, and what > + * fast-import got looked OK (e.g., it may have got nothing if the > + * helper died early). We still need to check the return code of the > + * helper to make sure it is happy with what it sent. > + * > + * Since the import command does not require the helper to ever report > + * success/failure of the import, we have no mechanism to check for > + * problems except to check its exit status. > + * > + * Callers of the transport code are allowed to make more requests > + * of our helper, so we may be disconnecting before they expect in that > + * case. However: > + * > + * 1. Current callers don't do that; after fetching refs, there > + * is nothing left for the helper to do. > + * > + * 2. We transparently start the helper as necessary, so if we were > + * to get another request (e.g., to import more refs), we would > + * simply start a new instantiation of the helper. > + * > + */ That's a comprehensive essay, unfortunately, it's not correct. The exit status of the remote-helper is not important, it's the one of fast-import that we really care about. > + if (disconnect_helper(transport) != 0) > die("Error while running helper"); Yeah, that's good, *if* the remote-helper is implemented correctly, but try this: if test -n "$GIT_REMOTE_TESTGIT_FAILURE" then exit 0 fi > argv_array_free_detached(fastimport.argv); > > which passes both your original test and the more strict one above. Of > course adding a done-import capability would be nice to fix the protocol > deficiency, but it is more code. The done-import capability *is already there*. The remote helper fails to say "done", fast-import detects that there was a problem and exits with -1 (or whatever), but it doesn't matter, because we (transport-helper) are oblivious of that return status until it's too late. >> @@ -818,6 +822,10 @@ static int push_refs_with_export(struct transport *transport, >> >> if (finish_command(&exporter)) >> die("Error while running fast-export"); >> + >> + if (!check_command(data->helper)) >> + die("Error while running helper"); >> + >> push_update_refs_status(data, remote_refs); >> return 0; >> } > > And this one is even more likely to race than the import case, I think, > as the exporter may send all of its data, exit, and then the helper > chugs on it for a bit (converting, sending across the network, etc). If > it is still chugging when we run check_command, we will not notice > anything here. > > E.g., if we instrument the failure like this: > > diff --git a/git-remote-testgit b/git-remote-testgit > index ca0cf09..a912bc1 100755 > --- a/git-remote-testgit > +++ b/git-remote-testgit > @@ -75,6 +75,12 @@ do > export) > if test -n "$GIT_REMOTE_TESTGIT_FAILURE" > then > + # imagine we read all of fast-export's data > + # first, and then die while trying to convert > + # it > + while read line; do > + test "$line" = "done" && break > + done > exit -1 > fi > > > we do not get the sigpipe from fast-export, and depending on the timing, > your check_command may or may not do anything (in my tests, it did not). > > But unlike the import side, the export command _does_ give us a status > report back from the helper: it prints an ok/error line for each ref > that was exported, followed by a blank line. So we should be able to > confirm that we get the blank line at all, and then that each ref was > present, which would determine whether the export failed or not without > being subject to race conditions. > > And we seem to do those checks already; the only problem I see is that > recvline does not print a message before dying. Would the patch below be > sufficient? > > diff --git a/transport-helper.c b/transport-helper.c > index dfdfa7a..cce2062 100644 > --- a/transport-helper.c > +++ b/transport-helper.c > @@ -54,7 +54,7 @@ static int recvline_fh(FILE *helper, struct strbuf *buffer) > if (strbuf_getline(buffer, helper, '\n') == EOF) { > if (debug) > fprintf(stderr, "Debug: Remote helper quit.\n"); > - exit(128); > + die("remote helper died unexpectedly"); > } > > if (debug) Yeah, I already explored this option, and I said it was possible on pushing, but now the problem is fetching. http://article.gmane.org/gmane.comp.version-control.git/219760 And to be frank, I'm tired of this. I keep repeating the same things over and over, when I ask for input on different ways to achieve this I get nothing, and when the patch is getting closer to be merged, then I receive criticism, only so I can repeat the same again. I don't like the option to die right in recvline(), but it would work. We would need something else for import though. It would be possible to tell fast-import to ping the remote-helper, but I ran into a SIGPIPE, and I don't have the patience to find out if perhaps there's a way to solve that. This option seems rather hacky and asymmetric to me. If this current patch series is not good enough, I think it's best to leave things as they are for me. Cheers. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html