Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests

Jeff King <peff@xxxxxxxx> · Sun, 21 Nov 2021 23:51:54 -0500

On Sun, Nov 21, 2021 at 06:17:04PM -0800, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > I'm not 100% sure this fixes any possible races, as the race Junio
> > initially reported seemed to be in the "propagated signals from pager"
> > test, which I don't think has these flaky-SIGPIPE problems. But I think
> > it's at least correcting some of the confusion. And we can see if it
> > happens again (I haven't been able to trigger any failures with --stress
> > myself).
> 
> Applying this (or this and the follow-up) seems to make t7006, which
> used to be flaky, to consistently fail at test "git returns SIGPIPE
> on propagated signals from pager" for me ;-)

Well, I guess it's good that we made things more consistent. :) It is
curious that you get failures and I don't, though. I wonder what the
difference is.

One curiosity is that the test does this:

	test_config core.pager ">pager-used; test-tool sigchain"

While "test-tool sigchain" will die with SIGTERM, it's the shell itself
which will waitpid() on it. And so in the end, what Git will generally
see is the same as if the shell had done "exit 143".

I wonder if the difference is between our shells. I know from previous
experience that bash will sometimes directly exec the final command in a
"-c" command, as an optimization. I don't get any difference running the
test with dash or bash, but that makes sense; the pager command is run
internally by Git via "sh -c".

Aha, that's it. If I recompile with SHELL_PATH=/bin/bash, then I see a
failure. Likewise, if I change the test like this:

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 851961c798..a87ef37803 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -741,7 +741,7 @@ test_expect_success TTY 'git skips paging nonexisting command' '
 
 test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager ">pager-used; test-tool sigchain" &&
+	test_config core.pager ">pager-used; exec test-tool sigchain" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&

then it fails even with dash. And that is, I think, closer to what the
test was actually trying to cover (since checking a shell's "exit 143"
is really no different than "exit 1", and we checked that earlier).

So why is it failing? It looks like trace2 reports this as code "-1"
rather than 143. I think that is because the fix from be8fc53e36 (pager:
properly log pager exit code when signalled, 2021-02-02) is incomplete.
It sets WEXITSTATUS() if the pager exited, but it doesn't handle signal
death at all. I think it needs:

diff --git a/run-command.c b/run-command.c
index f40df01c77..ef9d1d4236 100644
--- a/run-command.c
+++ b/run-command.c
@@ -555,6 +555,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	if (in_signal) {
 		if (WIFEXITED(status))
 			code = WEXITSTATUS(status);
+		else if (WIFSIGNALED(status))
+			code = 128 + WTERMSIG(status); /* see comment below */
 		return code;
 	}
 

-Peff