Re: [PATCH v5 13/14] core.fsyncmethod: performance tests for batch mode

Neeraj Singh <nksingh85@xxxxxxxxx> · Wed, 30 Mar 2022 21:09:28 -0700

On Tue, Mar 29, 2022 at 10:05 PM Neeraj Singh via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> From: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
>
> Add basic performance tests for git commands that can add data to the
> object database. We cover:
> * git add
> * git stash
> * git update-index (via git stash)
> * git unpack-objects
> * git commit --all
>
> We cover all currently available fsync methods as well.
>
> Signed-off-by: Neeraj Singh <neerajsi@xxxxxxxxxxxxx>
> ---
>  t/perf/p0008-odb-fsync.sh | 81 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
>  create mode 100755 t/perf/p0008-odb-fsync.sh
>
> diff --git a/t/perf/p0008-odb-fsync.sh b/t/perf/p0008-odb-fsync.sh
> new file mode 100755
> index 00000000000..87092c2627e
> --- /dev/null
> +++ b/t/perf/p0008-odb-fsync.sh
> @@ -0,0 +1,81 @@
> +#!/bin/sh
> +#
> +# This test measures the performance of adding new files to the object
> +# database. The test was originally added to measure the effect of the
> +# core.fsyncMethod=batch mode, which is why we are testing different values of
> +# that setting explicitly and creating a lot of unique objects.
> +
> +test_description="Tests performance of adding things to the object database"
> +
> +. ./perf-lib.sh
> +
> +. $TEST_DIRECTORY/lib-unique-files.sh
> +
> +test_perf_fresh_repo
> +test_checkout_worktree
> +
> +dir_count=10
> +files_per_dir=50
> +total_files=$((dir_count * files_per_dir))
> +
> +populate_files () {
> +       test_create_unique_files $dir_count $files_per_dir files
> +}
> +
> +setup_repo () {
> +       (rm -rf .git || 1) &&
> +       git init &&
> +       test_commit first &&
> +       populate_files
> +}
> +
> +test_perf_fsync_cfgs () {
> +       local method cfg &&
> +       for method in none fsync batch writeout-only
> +       do
> +               case $method in
> +               none)
> +                       cfg="-c core.fsync=none"
> +                       ;;
> +               *)
> +                       cfg="-c core.fsync=loose-object -c core.fsyncMethod=$method"
> +               esac &&
> +

In last round, I said I'd go with Ævar's scheme for iterating over
configs.  But when looking at the test output I decided that I wanted
a shorter label for each config rather than the actual command line to
make hte output more readable.

> +               # Set GIT_TEST_FSYNC=1 explicitly since fsync is normally
> +               # disabled by t/test-lib.sh.
> +               if ! test_perf "$1 (fsyncMethod=$method)" \
> +                                               --setup "$2" \
> +                                               "GIT_TEST_FSYNC=1 git $cfg $3"
> +               then
> +                       break
> +               fi
> +       done
> +}

So here I split the 'git $cfg' invocation off of the actual command
being executed, since it wasn't clear to me the best way to structure
this shell script.

The overall effect I want to achieve is to be able to iterate over
every config for each test case so that the different configs of the
same test appear next to each other in the output.

> +
> +test_perf_fsync_cfgs "add $total_files files" \
> +       "setup_repo" \
> +       "add -- files"
> +

I initially tried not substituting the $cfg variable in a test like this:
'git $cfg add -- files'

And then using eval in test_perf_fsync_cfgs to get the variable
substitution to happen later.

Is there a better way to write this?

Thanks,
Neeraj