[PATCH v2 00/12] diff --color-moved[-ws] speedups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks to Ævar and Elijah for their comments, I've reworded the commit
messages, addressed the enum initialization issue in patch 2 (now 3) and
added some perf tests.

There are two new patches in this round. The first patch is new and adds the
perf tests suggested by Ævar, the penultimate patch is also new and coverts
the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new
tests, the percentage changes are broadly similar to the previous results
though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (12):
  diff --color-moved: add perf tests
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved: intern strings

 diff.c                           | 377 ++++++++++++-------------------
 t/perf/p4002-diff-color-moved.sh |  45 ++++
 t/t4015-diff-whitespace.sh       | 137 +++++++++++
 3 files changed, 323 insertions(+), 236 deletions(-)
 create mode 100755 t/perf/p4002-diff-color-moved.sh


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Range-diff vs v1:

  -:  ----------- >  1:  8fc8914a37b diff --color-moved: add perf tests
  1:  374dbebcbf2 !  2:  9b4e4d2674a diff --color-moved=zerba: fix alternate coloring
     @@ Metadata
      Author: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
      
       ## Commit message ##
     -    diff --color-moved=zerba: fix alternate coloring
     +    diff --color-moved=zebra: fix alternate coloring
      
          b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
          alternation", 2018-11-23) sought to avoid using the alternate colors
  2:  3d02a0a91a0 !  3:  5512145c70f diff --color-moved: avoid false short line matches and bad zerba coloring
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       	int pmb_nr = 0, pmb_alloc = 0;
       	int n, flipped_block = 0, block_length = 0;
      -	enum diff_symbol last_symbol = 0;
     -+	enum diff_symbol moved_symbol = 0;
     ++	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
       
       
       	for (n = 0; n < o->emitted_symbols->nr; n++) {
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      -			last_symbol = l->s;
      +		}
      +		if (!match) {
     -+			moved_symbol = 0;
     ++			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
       			continue;
       		}
       
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      +			if (pmb_nr)
      +				moved_symbol = l->s;
      +			else
     -+				moved_symbol = 0;
     ++				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
      +
       			block_length = 0;
       		}
  3:  30f0ed44768 =  4:  93fdef30d64 diff: simplify allow-indentation-change delta calculation
  4:  ebb6eec1d92 !  5:  6b7a8aed4ec diff --color-moved-ws=allow-indentation-change: simplify and optimize
     @@ Commit message
          comparison to filter out the non-matching lines. Fixing this reduces
          time to run
            git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -    by 88% and simplifies the code.
     +    by 93% compared to master and simplifies the code.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      9.978 s ±  0.042 s    [User: 9.905 s, System: 0.057 s]
     -      Range (min … max):    9.917 s … 10.037 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.220 s ±  0.004 s    [User: 1.160 s, System: 0.058 s]
     -      Range (min … max):    1.214 s …  1.226 s    10 runs
     +    Test                                                                  HEAD^              HEAD
     +    ---------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41( 0.38+0.03)   0.41(0.37+0.04)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83( 0.79+0.04)   0.82(0.79+0.02)  -1.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  13.68(13.59+0.07)   0.92(0.89+0.03) -93.3%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.22+0.08)   1.31(1.21+0.10)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.40+0.07)   1.47(1.36+0.10)  +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.87( 1.77+0.09)   1.50(1.41+0.09) -19.8%
      
          Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
      
  5:  cec0c2d04d7 !  6:  cfbdd447eee diff --color-moved: call comparison function directly
     @@ Metadata
       ## Commit message ##
          diff --color-moved: call comparison function directly
      
     -    Calling xdiff_compare_lines() directly rather than using a function
     -    pointer from the hash map reduces the time very slightly but more
     -    importantly it will allow us to easily combine pmb_advance_or_null()
     -    and pmb_advance_or_null_multi_match() in the next commit.
     +    This change will allow us to easily combine pmb_advance_or_null() and
     +    pmb_advance_or_null_multi_match() in the next commit. Calling
     +    xdiff_compare_lines() directly rather than using a function pointer
     +    from the hash map has little effect on the run time.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.136 s ±  0.004 s    [User: 1.079 s, System: 0.053 s]
     -      Range (min … max):    1.130 s …  1.141 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.118 s ±  0.003 s    [User: 1.062 s, System: 0.053 s]
     -      Range (min … max):    1.114 s …  1.121 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    -------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.37+0.04)   0.41(0.39+0.02) +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.79+0.02)   0.83(0.79+0.03) +1.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.92(0.89+0.03)   0.91(0.85+0.05) -1.1%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31(1.21+0.10)   1.33(1.22+0.10) +1.5%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.36+0.10)   1.47(1.39+0.08) +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.50(1.41+0.09)   1.51(1.42+0.09) +0.7%
      
          Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
      
  6:  050cef0081d =  7:  73ce9b54e86 diff --color-moved: unify moved block growth functions
  7:  9390e9a66eb =  8:  ef8ce0e6ebc diff --color-moved: shrink potential moved blocks as we go
  8:  1de99ac2bc3 =  9:  9d0a042eae1 diff --color-moved: stop clearing potential moved blocks
  9:  41cdedd6090 ! 10:  dd365ad115f diff --color-moved-ws=allow-indentation-change: improve hash lookups
     @@ Commit message
          As libxdiff does not have a whitespace flag to ignore the indentation
          the code for --color-moved-ws=allow-indentation-change uses
          XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
     -    there are non-indentation changes. This is filtering is inefficient as
     +    there are non-indentation changes. This filtering is inefficient as
          we have to perform another string comparison.
      
          By using the offset data that we have already computed to skip the
          indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
     -    the extra checks which improves the performance by 14% and paves the
     +    the extra checks which improves the performance by 11% and paves the
          way for the elimination of string comparisons in the next commit.
      
     -    This change slightly increases the runtime of other --color-moved
     +    This change slightly increases the run time of other --color-moved
          modes. This could be avoided by using different comparison functions
     -    for the different modes but after the changes in the next commit there
     -    is no measurable benefit.
     +    for the different modes but after the next two commits there is no
     +    measurable benefit in doing so.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.116 s ±  0.005 s    [User: 1.057 s, System: 0.056 s]
     -      Range (min … max):    1.109 s …  1.123 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.216 s ±  0.005 s    [User: 1.155 s, System: 0.059 s]
     -      Range (min … max):    1.206 s …  1.223 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
     -      Range (min … max):    1.140 s …  1.154 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
     -      Range (min … max):    1.043 s …  1.056 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.38+0.03)   0.41(0.36+0.04) +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.76+0.05)   0.84(0.79+0.04) +2.4%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.91(0.88+0.03)   0.81(0.74+0.06) -11.0%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.32(1.21+0.10)   1.31(1.19+0.11) -0.8%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.37+0.10)   1.47(1.36+0.11) +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.51(1.42+0.09)   1.48(1.37+0.10) -2.0%
      
          Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
      
  -:  ----------- > 11:  c160222ab3c diff: use designated initializers for emitted_diff_symbol
 10:  220664dd907 ! 12:  753554587f9 diff --color-moved: intern strings
     @@ Commit message
          number of hash lookups a little (calculating the ids still involves
          one hash lookup per line) but the main benefit is that when growing
          blocks of potentially moved lines we can replace string comparisons
     -    which involve chasing a pointer with a simple integer comparison.  On
     -    a large diff this commit reduces the time to run 'diff --color-moved'
     -    by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%.
     +    which involve chasing a pointer with a simple integer comparison.
      
     -    Compared to master the time to run 'git log --patch --color-moved' is
     -    increased by 2% and 'git log --patch
     -    --color-moved-ws=allow-indentation-change' in reduced by 14%. These
     -    timings were performed on an i5-7200U, on an i5-3470 both commands are
     -    faster than master. The small speed decrease on commit sized diffs is
     -    unfortunate but I think it is small enough to be worth it for the
     -    gains on larger diffs.
     +    On a large diff this commit reduces the time to run
     +       diff --color-moved
     +    by 33% and
     +        diff --color-moved-ws=allow-indentation-change
     +    by 26%. Compared to master the time to run
     +        diff --color-moved-ws=allow-indentation-change
     +    is now reduced by 95% and the overhead compared to --no-color-moved is
     +    reduced to 50%.
      
     -    Large diff before this change:
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
     -      Range (min … max):    1.140 s …  1.154 s    10 runs
     +    Compared to the previous commit the time to run
     +        git log --patch --color-moved
     +    is increased slightly, but compared to master there is no change in
     +    run time.
      
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
     -      Range (min … max):    1.043 s …  1.056 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.36+0.04)   0.41(0.37+0.03)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83(0.79+0.03)   0.55(0.52+0.03) -33.7%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.81(0.77+0.04)   0.60(0.55+0.05) -25.9%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.30(1.20+0.09)   1.31(1.22+0.08)  +0.8%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.46(1.35+0.11)   1.47(1.30+0.16)  +0.7%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.46(1.38+0.07)   1.47(1.34+0.13)  +0.7%
      
     -    Large diff after this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):     762.7 ms ±   2.8 ms    [User: 707.5 ms, System: 53.7 ms]
     -      Range (min … max):   758.0 ms … 767.0 ms    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):     831.7 ms ±   1.7 ms    [User: 776.5 ms, System: 53.3 ms]
     -      Range (min … max):   829.2 ms … 835.1 ms    10 runs
     -
     -    Small diffs on master
     -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
     -      Time (mean ± σ):      1.567 s ±  0.001 s    [User: 1.443 s, System: 0.121 s]
     -      Range (min … max):    1.566 s …  1.571 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
     -      Time (mean ± σ):      1.865 s ±  0.008 s    [User: 1.748 s, System: 0.112 s]
     -      Range (min … max):    1.857 s …  1.881 s    10 runs
     -
     -    Small diffs after this change
     -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
     -      Time (mean ± σ):      1.597 s ±  0.003 s    [User: 1.413 s, System: 0.179 s]
     -      Range (min … max):    1.591 s …  1.601 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
     -      Time (mean ± σ):      1.606 s ±  0.006 s    [User: 1.420 s, System: 0.181 s]
     -      Range (min … max):    1.601 s …  1.622 s    10 runs
     +    Test                                                                  master            HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.40( 0.36+0.03)  0.41(0.37+0.03)  +2.5%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82( 0.77+0.04)  0.55(0.52+0.03) -32.9%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.10(14.04+0.04)  0.60(0.55+0.05) -95.7%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.21+0.09)  1.31(1.22+0.08)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.37+0.09)  1.47(1.30+0.16)  +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.86( 1.76+0.10)  1.47(1.34+0.13) -21.0%
      
          Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
      
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
       				if (o->color_moved_ws_handling &
       				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
     -@@ diff.c: static void emit_diff_symbol_from_struct(struct diff_options *o,
     - static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
     - 			     const char *line, int len, unsigned flags)
     - {
     --	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
     -+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, 0, s};
     - 
     - 	if (o->emitted_symbols)
     - 		append_emitted_diff_symbol(o, &e);
      @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
       
       	if (o->emitted_symbols) {

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux