On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote: > The current implementation of diff --color-moved-ws=allow-indentation-change > is considerably slower that the implementation of diff --color-moved which > is in turn slower than a regular diff. This patch series starts with a > couple of bug fixes and then reworks the implementation of diff > --color-moved and diff --color-moved-ws=allow-indentation-change to speed > them up on large diffs. The time to run git diff --color-moved > --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run > git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0 > v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs > with --color-moved - the time to run git log -p --color-moved > --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent > processors. On older processors these patches reduce the running time in all > cases that I've tested. In general the larger the diff the larger the speed > up. As an extreme example the time to run diff --color-moved > --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8 > minutes to 6 seconds. > > Phillip Wood (10): > diff --color-moved=zerba: fix alternate coloring > diff --color-moved: avoid false short line matches and bad zerba > coloring > diff: simplify allow-indentation-change delta calculation > diff --color-moved-ws=allow-indentation-change: simplify and optimize > diff --color-moved: call comparison function directly > diff --color-moved: unify moved block growth functions > diff --color-moved: shrink potential moved blocks as we go > diff --color-moved: stop clearing potential moved blocks > diff --color-moved-ws=allow-indentation-change: improve hash lookups > diff --color-moved: intern strings Nice to see these land after the earlier on-list reference to them. I skimmed these mostly, and am not familiar with this code, but didn't see any glaring things missing. There was one existing oddity with assigning a 0 to an "enum diff_symbol", don't we want DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch in 02/10, and pre-dates these changes. One thing I would very much like to see here is a conversion of the existing ad-hoc benchmarks you note in commit messages to something that lives in t/perf/, it really helps future maintenance of perf-sensitive code to be able to re-run those, and I for one find the output much easier to read than whatever tool you're using to produce your benchmarks.