Re: [PATCH 00/10] diff --color-moved[-ws] speedups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote:

On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (10):
   diff --color-moved=zerba: fix alternate coloring
   diff --color-moved: avoid false short line matches and bad zerba
     coloring
   diff: simplify allow-indentation-change delta calculation
   diff --color-moved-ws=allow-indentation-change: simplify and optimize
   diff --color-moved: call comparison function directly
   diff --color-moved: unify moved block growth functions
   diff --color-moved: shrink potential moved blocks as we go
   diff --color-moved: stop clearing potential moved blocks
   diff --color-moved-ws=allow-indentation-change: improve hash lookups
   diff --color-moved: intern strings

Nice to see these land after the earlier on-list reference to them.

I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.

Thanks for taking a look at these patches. I take your point about the assignment, I don't think the actual value matters so long as it's not DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS.

One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.

Adding some perf tests is a good idea, I'll do that when I reroll which may take a couple of weeks as I'm going offline for a while at the end of the week. The tool I have been using is hyperfine[1], it has been used by a few other contributors (see `git log --grep σ` if you're interested)

[1] https://github.com/sharkdp/hyperfine

Best Wishes

Phillip




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux