On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote:
On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:
The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.
Phillip Wood (10):
diff --color-moved=zerba: fix alternate coloring
diff --color-moved: avoid false short line matches and bad zerba
coloring
diff: simplify allow-indentation-change delta calculation
diff --color-moved-ws=allow-indentation-change: simplify and optimize
diff --color-moved: call comparison function directly
diff --color-moved: unify moved block growth functions
diff --color-moved: shrink potential moved blocks as we go
diff --color-moved: stop clearing potential moved blocks
diff --color-moved-ws=allow-indentation-change: improve hash lookups
diff --color-moved: intern strings
Nice to see these land after the earlier on-list reference to them.
I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.
Thanks for taking a look at these patches. I take your point about the
assignment, I don't think the actual value matters so long as it's not
DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS.
One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.
Adding some perf tests is a good idea, I'll do that when I reroll which
may take a couple of weeks as I'm going offline for a while at the end
of the week. The tool I have been using is hyperfine[1], it has been
used by a few other contributors (see `git log --grep σ` if you're
interested)
[1] https://github.com/sharkdp/hyperfine
Best Wishes
Phillip