Re: [PATCH v3 0/2] Teach diff to honor diff algorithms set through git attributes

Elijah Newren <newren@xxxxxxxxx> · Fri, 17 Feb 2023 17:16:04 -0800

On Fri, Feb 17, 2023 at 12:21 PM John Cai via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> When a repository contains different kinds of files, it may be desirable to
> use different algorithms based on file type. This is currently not feasible
> through the command line or using git configs. However, we can leverage the
> fact that gitattributes are path aware.
>
> Teach the diff machinery to check gitattributes when diffing files by using
> the existing diff. scheme, and add an "algorithm" type to the external
> driver config.
[...]
> To address some of the performance concerns in the previous series, a
> benchmark shows that a performance penalty is no longer incurred, now that
> we are no longer adding an additional attributes parsing call:
>
> $ hyperfine -r 5 -L a bin-wrappers/git,git '{a} diff v2.0.0 v2.28.0'
> Benchmark 1: git-bin-wrapper diff v2.0.0 v2.28.0 Time (mean ± σ): 1.072 s ±
> 0.289 s [User: 0.626 s, System: 0.081 s] Range (min … max): 0.772 s … 1.537
> s 5 runs
>
> Benchmark 2: git diff v2.0.0 v2.28.0 Time (mean ± σ): 1.003 s ± 0.065 s
> [User: 0.684 s, System: 0.067 s] Range (min … max): 0.914 s … 1.091 s 5 runs
>
> Summary 'git diff v2.0.0 v2.28.0' ran 1.07 ± 0.30 times faster than
> 'git-bin-wrapper diff v2.0.0 v2.28.0'

I'm sorry, I don't understand this.  What are you measuring?  I
presume bin-wrappers/git refers to the version of git built with your
changes, but what version of git does "git" refer to?  Also, do you
have any .gitattributes or .git/config changes present when you are
testing to trigger the new functionality you have written?

Also, doesn't this benchmark demonstrate the opposite of your claim?
You said there was no performance penalty, but the benchmark shows a
7% slowdown.  We've battled hard to get smaller improvements than
that, so this is still worrisome, even if it's no longer a factor of 2
or whatever it was.  But, again, I'm not sure what is being measured.
If the difference is because patience diff was used for some files,
then it's not an apples-to-apples comparison, and a 7% slowdown would
be no cause for concern.

Since I was curious, I compiled both a version of git from directly
before your series, and directly after, then added a '*.[ch]
diff=other' line to the end of .gitattributes, then ran:

$ hyperfine -L a ./older-git,./newer-git '{a} -c
diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0'
Benchmark 1: ./older-git -c diff.other.algorithm=myers diff --numstat
v2.0.0 v2.28.0
  Time (mean ± σ):     870.2 ms ±   4.4 ms    [User: 755.2 ms, System: 109.8 ms]
  Range (min … max):   861.0 ms … 876.8 ms    10 runs

Benchmark 2: ./newer-git -c diff.other.algorithm=myers diff --numstat
v2.0.0 v2.28.0
  Time (mean ± σ):     876.9 ms ±   4.8 ms    [User: 758.0 ms, System: 113.1 ms]
  Range (min … max):   870.7 ms … 884.1 ms    10 runs

Summary
  './older-git -c diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0' ran
    1.01 ± 0.01 times faster than './newer-git -c
diff.other.algorithm=myers diff --numstat v2.0.0 v2.28.0'

I specifically specified 'myers' to match what we'd get from the
default anyway, so I would only be testing the slowdown from the
.gitattribute parsing.  So, I think the performance overhead comes out
to just 1% rather than 7% (and further that's when I make it only
print overall stats about the diff rather than the full diff, since I
know that's faster.  If I didn't do that, the perf hit might appear to
be less than 1%).