Re: [PATCH 1/6] test-lib: introduce test_commit_bulk

Derrick Stolee <stolee@xxxxxxxxx> · Fri, 28 Jun 2019 08:35:28 -0400

On 6/28/2019 5:39 AM, Jeff King wrote:
> Some tests need to create a string of commits. Doing this with
> test_commit is very heavy-weight, as it needs at least one process per
> commit (and in fact, uses several).
> 
> For bulk creation, we can do much better by using fast-import, but it's
> often a pain to generate the input. Let's provide a helper to do so.

What a quick turnaround! I'm happy to nerd-snipe you here, and am glad
the result was so positive.

> We'll use t5310 as a guinea pig, as it has three 10-commit loops. Here
> are hyperfine results before and after:
> 
>   [before]
>   Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests
>     Time (mean ± σ):      2.846 s ±  0.305 s    [User: 3.042 s, System: 0.919 s]
>     Range (min … max):    2.250 s …  3.210 s    10 runs
> 
>   [after]
>   Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests
>     Time (mean ± σ):      2.210 s ±  0.174 s    [User: 2.570 s, System: 0.604 s]
>     Range (min … max):    1.999 s …  2.590 s    10 runs

I ran these tests on my Windows machine, where the process startup time is
a higher cost. The improvement is noticeable from just watching the test lines
pause on the steps creating the commits.

 Before: 30.8-31.2s
  After: 23.5-23.8s

> So we're over 20% faster, while making the callers slightly shorter.

I see about the same relative measurement (~23%). The callers are a bit
cleaner, which is good. They are also slightly less clear of what's
happening, but that's the cost of abstraction. Definitely worth it in
this case!

> +# Similar to test_commit, but efficiently create <nr> commits, each with a
> +# unique number $n (from 1 to <nr> by default) in the commit message.
> +#
> +# Usage: test_commit_bulk [options] <nr>
> +#   -C <dir>:
> +#	Run all git commands in directory <dir>
> +#   --ref=<n>:
> +#	ref on which to create commits (default: HEAD)
> +#   --start=<n>:
> +#	number commit messages from <n> (default: 1)
> +#   --message=<msg>:
> +#	use <msg> as the commit mesasge (default: "commit $n")
> +#   --filename=<fn>:
> +#	modify <fn> in each commit (default: $n.t)
> +#   --contents=<string>:
> +#	place <string> in each file (default: "content $n")
> +#   --id=<string>:
> +#	shorthand to use <string> and $n in message, filename, and contents
> +#
> +# The message, filename, and contents strings are evaluated by the shell inside
> +# double-quotes, with $n set to the current commit number. So you can do:
> +#
> +#   test_commit_bulk --filename=file --contents='modification $n'
> +#
> +# to have every commit touch the same file, but with unique content. Spaces are
> +# OK, but you must escape any metacharacters (like backslashes or
> +# double-quotes) you do not want expanded.
> +#

I appreciate all the documentation here!

> +		while test "$total" -gt 0
> +		do
> +			echo "commit $ref" &&
> +			printf 'author %s <%s> %s\n' \
> +				"$GIT_AUTHOR_NAME" \
> +				"$GIT_AUTHOR_EMAIL" \
> +				"$cur_time -0700" &&
> +			printf 'committer %s <%s> %s\n' \
> +				"$GIT_COMMITTER_NAME" \
> +				"$GIT_COMMITTER_EMAIL" \
> +				"$cur_time -0700" &&
> +			echo "data <<EOF" &&
> +			eval "echo \"$message\"" &&
> +			echo "EOF" &&
> +			eval "echo \"M 644 inline $filename\"" &&
> +			echo "data <<EOF" &&
> +			eval "echo \"$contents\"" &&
> +			echo "EOF" &&
> +			echo &&
> +			n=$((n + 1)) &&
> +			cur_time=$((cur_time + 1)) &&
> +			total=$((total - 1)) ||
> +			echo "poison fast-import stream"
> +		done

I am not very good at the nitty-gritty details of our scripts, but
looking at this I wonder if there is a cleaner and possibly faster
way to do this loop. The top thing on my mind are the 'eval "echo X"'
lines. If they start processes, then we can improve the performance.
If not, then it may not be worth it.

In wonder if instead we could create some format string outside the
loop and then pass the values that change between iterations into
that format string.

Thanks,
-Stolee