Re: [PATCH v3 02/10] t7422: fix flaky test caused by buffered stdout

Jeff King <peff@xxxxxxxx> · Thu, 9 Jan 2025 02:33:54 -0500

On Tue, Jan 07, 2025 at 01:30:44PM +0100, Patrick Steinhardt wrote:

> The issue is caused by us using grep(1) to terminate the pipe on the
> first matching line in the recursing git-submodule(1) process. Standard
> streams are typically buffered though, so this condition is racy and may
> cause us to terminate the pipe after git-submodule(1) has already
> exited, and in that case we wouldn't see the expected signal.

This patch looks good to me overall, but I think there are a few small
inaccuracies in the commit message.

I don't think buffering is relevant here. Especially since in the
original test there isn't any buffering (the output comes from separate
recursive status processes, so each line gets its own write() call).

The race you're seeing is:

  1. git-submodule (or its child process) writes the first X/S line
     we're trying to match

  2. grep matches the line

  3a. grep exits, closing the pipe

  3b. git-submodule (or its children) writes the rest of its lines.

Steps 3a and 3b happen at the same time without any guarantees. If 3a
happens first, we get SIGPIPE. Otherwise, we don't (and the test fails).

And when git-submodule exits is not important. The critical timing is
when it does its final write(). If the pipe closes after that, even if
it is still running, it will not notice. (I admit that one is a
micro-nit, though).

> Fix the issue by generating a couple thousand nested submodules and
> matching on the first nested submodule. This ensures that the recursive
> git-submodule(1) process completely fills its stdout buffer, which makes
> subsequent writes block until the downstream consumer of the pipe either
> fully drains it or closes it.

One more micro-nit: "fully drains" is not accurate. If grep reads
another 4096 bytes, then that opens up 4096 more bytes in the pipe
buffer. And git-submodule can then write to it. Not important for our
case, since we "know" grep will not read more after matching, but will
just close. So it is really more like "block until the downstream
consumer either reads more or closes it".

That "know" is a little bit of an assumption. In particular, there is no
reason grep could not read into a 1MB buffer, consuming the whole input,
match within it, and only then exit. And then we'd be racy again. In
practice I'm not too worried about this. I'd be surprised by a buffer
bigger than 8k (which is what I saw when I strace'd it on my Linux
system), and your generated input is something like 160k. That should
fill even a generous pipe buffer plus grep input buffer.

> +	git init submodule &&
> +	(
> +		cd submodule &&
> +		test_commit initial &&
> +
> +		COMMIT=$(git rev-parse HEAD) &&
> +		for i in $(test_seq 2000)
> +		do
> +			printf "[submodule \"sm-$i\"]\npath = recursive-submodule-path-$i\n" "$i" ||
> +			return 1
> +		done >gitmodules &&
> +		BLOB=$(git hash-object -w --stdin <gitmodules) &&
> +
> +		printf "100644 blob $BLOB\t.gitmodules\n" >tree &&
> +		for i in $(test_seq 2000)
> +		do
> +			printf "160000 commit $COMMIT\trecursive-submodule-path-%d\n" "$i" ||
> +			return 1
> +		done >>tree &&
> +		TREE=$(git mktree <tree) &&
> +
> +		COMMIT=$(git commit-tree "$TREE") &&
> +		git reset --hard "$COMMIT"
> +	) &&

OK, so the submodule has a huge number of missing children. But that's
enough for us, because the "-" lines generated by "submodule status" are
fine. So you're able to generate the whole thing with only printf
processes running in the loops. Very clever.

> +	git init repo &&
> +	(
> +		cd repo &&
> +		GIT_ALLOW_PROTOCOL=file git submodule add "$(pwd)"/../submodule &&
> +		{ git submodule status --recursive 2>err; echo $?>status; } |
> +			grep -q recursive-submodule-path-1 &&
> +		test_must_be_empty err &&
> +		test_match_signal 13 "$(cat status)"
> +	)

And then adding another repo wrapping it is what makes it recursive.
Nice.

-Peff