Re: [PATCH 02/10] t7422: fix flaky test caused by buffered stdout

Jeff King <peff@xxxxxxxx> · Fri, 3 Jan 2025 13:17:39 -0500

On Fri, Jan 03, 2025 at 03:46:39PM +0100, Patrick Steinhardt wrote:

> One test in t7422 asserts that `git submodule status --recursive`
> properly handles SIGPIPE. This test is flaky though and may sometimes
> not see a SIGPIPE at all:
> 
>     expecting success of 7422.18 'git submodule status --recursive propagates SIGPIPE':
>             { git submodule status --recursive 2>err; echo $?>status; } |
>                     grep -q X/S &&
>             test_must_be_empty err &&
>             test_match_signal 13 "$(cat status)"

I couldn't reproduce with --stress, but you can trigger it all the time
with:

diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh
index f21e920367..9338c75626 100755
--- a/t/t7422-submodule-output.sh
+++ b/t/t7422-submodule-output.sh
@@ -168,7 +168,7 @@ done
 
 test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE' '
 	{ git submodule status --recursive 2>err; echo $?>status; } |
-		grep -q X/S &&
+		{ sleep 1 && grep -q X/S; } &&
 	test_must_be_empty err &&
 	test_match_signal 13 "$(cat status)"
 '

The problem is that git-submodule may write all of its output before
grep exits, and it gets stored in the pipe buffer. And then even if grep
exits before reading all of it, it is too late for SIGPIPE, and the data
in the pipe is just discarded by the OS.

So this:

> The issue is caused by us using grep(1) to terminate the pipe on the
> first matching line in the recursing git-submodule(1) process. Standard
> streams are typically buffered though, so this condition is racy and may
> cause us to terminate the pipe after git-submodule(1) has already
> exited, and in that case we wouldn't see the expected signal.
> 
> Fix the issue by converting standard streams to be unbuffered. I have
> only been able to reproduce this issue a single time after running t7422
> with `--stress` after an extended amount of time, so I cannot claim to
> be fully certain that this fix is sufficient.

isn't quite right. Even without input buffering on grep's part, it may
be too slow to read the data. And adding a sleep as above shows that it
still fails with your patch.

The usual way to reliably get SIGPIPE is to make sure the writer
produces enough data to fill the pipe buffer. But it's tricky to get
"submodule status" to produce a lot of data without having a ton of
submodules, which is expensive to set up.

But we can hack around it by stuffing the pipe full with a separate
process. Like this:

diff --git a/t/t7422-submodule-output.sh b/t/t7422-submodule-output.sh
index f21e920367..c4df2629e8 100755
--- a/t/t7422-submodule-output.sh
+++ b/t/t7422-submodule-output.sh
@@ -167,8 +167,15 @@ do
 done
 
 test_expect_success !MINGW 'git submodule status --recursive propagates SIGPIPE' '
-	{ git submodule status --recursive 2>err; echo $?>status; } |
-		grep -q X/S &&
+	{
+		# stuff pipe buffer full of input so that submodule status
+		# will require blocking on write; this script will write over
+		# 128kb. It might itself get SIGPIPE, so we must not &&-chain
+		# it directly.
+		{ perl -le "print q{foo} for (1..33000)" || true; } &&
+		git submodule status --recursive 2>err
+		echo $? >status
+	} | { sleep 1 && head -n 1 >/dev/null; } &&
 	test_must_be_empty err &&
 	test_match_signal 13 "$(cat status)"
 '
A few notes:

  - the sleep is still there to demonstrate that it always works, but
    obviously we'd want to remove that

  - I swapped out "grep" for "head". What we are matching is not
    relevant; the important thing is that the reader closes the pipe
    immediately. So I guess in that sense we could probably even just
    pipe to "true" or similar.

  - I tried using test_seq to avoid the inline perl, but it doesn't
    work! The problem is that it's implemented as a shell function. So
    when it gets SIGPIPE, the whole subshell is killed, and we never
    even run git-submodule at all. So it has to be a separate process
    (though I guess it could be test_seq in a subshell).

Anyway, hopefully that gives you enough to play around with.

-Peff