On 6/28/2019 5:39 AM, Jeff King wrote: > Some tests need to create a string of commits. Doing this with > test_commit is very heavy-weight, as it needs at least one process per > commit (and in fact, uses several). > > For bulk creation, we can do much better by using fast-import, but it's > often a pain to generate the input. Let's provide a helper to do so. What a quick turnaround! I'm happy to nerd-snipe you here, and am glad the result was so positive. > We'll use t5310 as a guinea pig, as it has three 10-commit loops. Here > are hyperfine results before and after: > > [before] > Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests > Time (mean ± σ): 2.846 s ± 0.305 s [User: 3.042 s, System: 0.919 s] > Range (min … max): 2.250 s … 3.210 s 10 runs > > [after] > Benchmark #1: ./t5310-pack-bitmaps.sh --root=/var/ram/git-tests > Time (mean ± σ): 2.210 s ± 0.174 s [User: 2.570 s, System: 0.604 s] > Range (min … max): 1.999 s … 2.590 s 10 runs I ran these tests on my Windows machine, where the process startup time is a higher cost. The improvement is noticeable from just watching the test lines pause on the steps creating the commits. Before: 30.8-31.2s After: 23.5-23.8s > So we're over 20% faster, while making the callers slightly shorter. I see about the same relative measurement (~23%). The callers are a bit cleaner, which is good. They are also slightly less clear of what's happening, but that's the cost of abstraction. Definitely worth it in this case! > +# Similar to test_commit, but efficiently create <nr> commits, each with a > +# unique number $n (from 1 to <nr> by default) in the commit message. > +# > +# Usage: test_commit_bulk [options] <nr> > +# -C <dir>: > +# Run all git commands in directory <dir> > +# --ref=<n>: > +# ref on which to create commits (default: HEAD) > +# --start=<n>: > +# number commit messages from <n> (default: 1) > +# --message=<msg>: > +# use <msg> as the commit mesasge (default: "commit $n") > +# --filename=<fn>: > +# modify <fn> in each commit (default: $n.t) > +# --contents=<string>: > +# place <string> in each file (default: "content $n") > +# --id=<string>: > +# shorthand to use <string> and $n in message, filename, and contents > +# > +# The message, filename, and contents strings are evaluated by the shell inside > +# double-quotes, with $n set to the current commit number. So you can do: > +# > +# test_commit_bulk --filename=file --contents='modification $n' > +# > +# to have every commit touch the same file, but with unique content. Spaces are > +# OK, but you must escape any metacharacters (like backslashes or > +# double-quotes) you do not want expanded. > +# I appreciate all the documentation here! > + while test "$total" -gt 0 > + do > + echo "commit $ref" && > + printf 'author %s <%s> %s\n' \ > + "$GIT_AUTHOR_NAME" \ > + "$GIT_AUTHOR_EMAIL" \ > + "$cur_time -0700" && > + printf 'committer %s <%s> %s\n' \ > + "$GIT_COMMITTER_NAME" \ > + "$GIT_COMMITTER_EMAIL" \ > + "$cur_time -0700" && > + echo "data <<EOF" && > + eval "echo \"$message\"" && > + echo "EOF" && > + eval "echo \"M 644 inline $filename\"" && > + echo "data <<EOF" && > + eval "echo \"$contents\"" && > + echo "EOF" && > + echo && > + n=$((n + 1)) && > + cur_time=$((cur_time + 1)) && > + total=$((total - 1)) || > + echo "poison fast-import stream" > + done I am not very good at the nitty-gritty details of our scripts, but looking at this I wonder if there is a cleaner and possibly faster way to do this loop. The top thing on my mind are the 'eval "echo X"' lines. If they start processes, then we can improve the performance. If not, then it may not be worth it. In wonder if instead we could create some format string outside the loop and then pass the values that change between iterations into that format string. Thanks, -Stolee