Re: [PATCH 1/6] test-lib: introduce test_commit_bulk

Jeff King <peff@xxxxxxxx> · Fri, 28 Jun 2019 20:09:43 -0400

On Fri, Jun 28, 2019 at 08:35:28AM -0400, Derrick Stolee wrote:

> > +		while test "$total" -gt 0
> > +		do
> > +			echo "commit $ref" &&
> > +			printf 'author %s <%s> %s\n' \
> > +				"$GIT_AUTHOR_NAME" \
> > +				"$GIT_AUTHOR_EMAIL" \
> > +				"$cur_time -0700" &&
> > +			printf 'committer %s <%s> %s\n' \
> > +				"$GIT_COMMITTER_NAME" \
> > +				"$GIT_COMMITTER_EMAIL" \
> > +				"$cur_time -0700" &&
> > +			echo "data <<EOF" &&
> > +			eval "echo \"$message\"" &&
> > +			echo "EOF" &&
> > +			eval "echo \"M 644 inline $filename\"" &&
> > +			echo "data <<EOF" &&
> > +			eval "echo \"$contents\"" &&
> > +			echo "EOF" &&
> > +			echo &&
> > +			n=$((n + 1)) &&
> > +			cur_time=$((cur_time + 1)) &&
> > +			total=$((total - 1)) ||
> > +			echo "poison fast-import stream"
> > +		done
> 
> I am not very good at the nitty-gritty details of our scripts, but
> looking at this I wonder if there is a cleaner and possibly faster
> way to do this loop. The top thing on my mind are the 'eval "echo X"'
> lines. If they start processes, then we can improve the performance.
> If not, then it may not be worth it.

No, evals by themselves don't require a process.  That whole loop should
all happen as a single process (because it's the left-hand side of the
pipe, it does require a subshell).

We could drop even that process by writing into a temporary file. The
size probably wouldn't be a big deal, and I doubt the latency would even
matter much (and anyway, when you're running the tests in parallel
anyway, CPU time is the most important metric).

It might also make the code a little simpler, since we'd be running in
the main shell and could just use test_tick naturally (rather than the
manual addition hackery).

I'll take a look.

I wasn't super concerned with eliminating processes here as long as the
number of them is constant with respect to the number of commits we're
generating. The big improvement is taking, say, 300 test_commit calls
and turning it into a single bulk call. Replacing a single-commit
test_commit with this would be break-even at best.

> In wonder if instead we could create some format string outside the
> loop and then pass the values that change between iterations into
> that format string.

The evals should be fast. But they are potentially error-prone, since
callers have to pass something like --message='commit $n' with single
quotes to keep the "$" intact. But because all of our test snippets are
inside single-quotes already, you end up with:

  test_bulk_commit --message="commit \$n"

(though in practice most of the callers used the --id shorthand, which
neatly sidesteps this).

Since there's literally only one variable to interpolate, we could swap
this out for using printf formatters, and letting "%s" mean the same as
"$n". It should perform the same but is a bit less magical and a bit
harder to screw up. It would also be easier to handle if
test_commit_bulk eventually became C code. The only downside I can think
of is that you can't mention "%s" twice, but I find it hard to imagine a
caller would want that anyway.

So I'll also take a look at that.

-Peff