Re: [PATCH v2 4/4] t/perf: add fsmonitor perf test for git diff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 19, 2020 at 09:35:15PM +0000, Nipunn Koorapati via GitGitGadget wrote:
> From: Nipunn Koorapati <nipunn@xxxxxxxxxxx>
>
> Results for the git-diff fsmonitor optimization
> in patch in the parent-rev (using a 400k file repo to test)
>
> As you can see here - git diff with fsmonitor running is
> significantly better with this patch series (80% faster on my
> workload)!

These t/perf numbers are very helpful, at least to me.

> GIT_PERF_LARGE_REPO=~/src/server ./run v2.29.0-rc1 . -- p7519-fsmonitor.sh
>
> Test                                                                     v2.29.0-rc1       this tree
> -----------------------------------------------------------------------------------------------------------------
> 7519.2: status (fsmonitor=.git/hooks/fsmonitor-watchman)                 1.46(0.82+0.64)   1.47(0.83+0.62) +0.7%
> 7519.3: status -uno (fsmonitor=.git/hooks/fsmonitor-watchman)            0.16(0.12+0.04)   0.17(0.12+0.05) +6.3%
> 7519.4: status -uall (fsmonitor=.git/hooks/fsmonitor-watchman)           1.36(0.73+0.62)   1.37(0.76+0.60) +0.7%

Looks like about 0.01sec of overhead, which seems like an acceptable
trade-off for when the user has at least 10,000 files.

This reminds me; did you look at the 'git add' performance change? I
recall Junio mentioning that 'git add' takes the same paths in the code.

> 7519.5: diff (fsmonitor=.git/hooks/fsmonitor-watchman)                   0.85(0.22+0.63)   0.14(0.10+0.05) -83.5%
> 7519.6: diff -- 0_files (fsmonitor=.git/hooks/fsmonitor-watchman)        0.12(0.08+0.05)   0.13(0.11+0.02) +8.3%
> 7519.7: diff -- 10_files (fsmonitor=.git/hooks/fsmonitor-watchman)       0.12(0.08+0.04)   0.13(0.09+0.04) +8.3%
> 7519.8: diff -- 100_files (fsmonitor=.git/hooks/fsmonitor-watchman)      0.12(0.07+0.05)   0.13(0.07+0.06) +8.3%
> 7519.9: diff -- 1000_files (fsmonitor=.git/hooks/fsmonitor-watchman)     0.12(0.09+0.04)   0.13(0.08+0.05) +8.3%
> 7519.10: diff -- 10000_files (fsmonitor=.git/hooks/fsmonitor-watchman)   0.14(0.09+0.05)   0.13(0.10+0.03) -7.1%

OK... so having fsmonitor turned on adds an imperceptible amount of
slow-down to cases where there are [0, 10000) files. But, in exchange,
you get much-improved whole-tree performance, as well as single-tree
performance when that tree contains at least 10,000 files.

I was going to say that this has little downside, because turning on
fsmonitor is probably a good indicator that you don't have any fewer
than 10,000 files in your repository, but I think that's missing the
point. Likely true, but that doesn't exclude the possibility of having
sub-10,000 file directories, which users may very well still be
diff-ing.

So, there's a slow-down, but it's hard to complain when you consider
what we get in exchange.

> 7519.12: status (fsmonitor=)                                             1.67(0.93+1.49)   1.67(0.99+1.42) +0.0%
> 7519.13: status -uno (fsmonitor=)                                        0.37(0.30+0.82)   0.37(0.33+0.79) +0.0%
> 7519.14: status -uall (fsmonitor=)                                       1.58(0.97+1.35)   1.57(0.86+1.45) -0.6%
> 7519.15: diff (fsmonitor=)                                               0.34(0.28+0.83)   0.34(0.27+0.83) +0.0%
> 7519.16: diff -- 0_files (fsmonitor=)                                    0.09(0.06+0.04)   0.09(0.08+0.02) +0.0%
> 7519.17: diff -- 10_files (fsmonitor=)                                   0.09(0.07+0.03)   0.09(0.06+0.05) +0.0%
> 7519.18: diff -- 100_files (fsmonitor=)                                  0.09(0.06+0.04)   0.09(0.06+0.04) +0.0%
> 7519.19: diff -- 1000_files (fsmonitor=)                                 0.09(0.06+0.04)   0.09(0.05+0.05) +0.0%
> 7519.20: diff -- 10000_files (fsmonitor=)                                0.10(0.08+0.04)   0.10(0.06+0.05) +0.0%

Great! No slow-down without fsmonitor enabled, as expected. Fantastic.

> I also added a benchmark for a tiny git diff workload w/ a pathspec.
> I see an approximately .02 second overhead added w/ and w/o fsmonitor
>
> From looking at these results, I suspected that refresh_fsmonitor
> is already happening during git diff - independent of this patch
> series' optimization. Confirmed that suspicion by breaking on
> refresh_fsmonitor.

So, the overhead that we're paying is purely the pipe+fork+exec? I.e.,
that watchman has already computed an answer in the earlier call, and we
just have to read it again (or find out that the last results were
unchanged)?

> (gdb) bt  [simplified]
> 0  refresh_fsmonitor  at fsmonitor.c:176
> 1  ie_match_stat  at read-cache.c:375
> 2  match_stat_with_submodule at diff-lib.c:237
> 4  builtin_diff_files  at builtin/diff.c:260
> 5  cmd_diff  at builtin/diff.c:541
> 6  run_builtin  at git.c:450
> 7  handle_builtin  at git.c:700
> 8  run_argv  at git.c:767
> 9  cmd_main  at git.c:898
> 10 main  at common-main.c:52

:-).

> Signed-off-by: Nipunn Koorapati <nipunn@xxxxxxxxxxx>
> ---
>  t/perf/p7519-fsmonitor.sh | 71 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 71 insertions(+)
>
> diff --git a/t/perf/p7519-fsmonitor.sh b/t/perf/p7519-fsmonitor.sh
> index 9313d4a51d..2b4803707f 100755
> --- a/t/perf/p7519-fsmonitor.sh
> +++ b/t/perf/p7519-fsmonitor.sh
> @@ -115,6 +115,13 @@ test_expect_success "setup for fsmonitor" '

Everything in here looks very reasonable to me, except for the seq vs.
test_seq() issue that I pointed out in another email in this thread.

It's too bad that we have to write these twice, but that's not the fault
of your patch.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux