Re: [PATCH 00/10] diff --color-moved[-ws] speedups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:

> The current implementation of diff --color-moved-ws=allow-indentation-change
> is considerably slower that the implementation of diff --color-moved which
> is in turn slower than a regular diff. This patch series starts with a
> couple of bug fixes and then reworks the implementation of diff
> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
> them up on large diffs. The time to run git diff --color-moved
> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
> with --color-moved - the time to run git log -p --color-moved
> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
> processors. On older processors these patches reduce the running time in all
> cases that I've tested. In general the larger the diff the larger the speed
> up. As an extreme example the time to run diff --color-moved
> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
> minutes to 6 seconds.
>
> Phillip Wood (10):
>   diff --color-moved=zerba: fix alternate coloring
>   diff --color-moved: avoid false short line matches and bad zerba
>     coloring
>   diff: simplify allow-indentation-change delta calculation
>   diff --color-moved-ws=allow-indentation-change: simplify and optimize
>   diff --color-moved: call comparison function directly
>   diff --color-moved: unify moved block growth functions
>   diff --color-moved: shrink potential moved blocks as we go
>   diff --color-moved: stop clearing potential moved blocks
>   diff --color-moved-ws=allow-indentation-change: improve hash lookups
>   diff --color-moved: intern strings

Nice to see these land after the earlier on-list reference to them.

I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.

One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux