[PATCH 0/8] Optimization batch 10: avoid detecting even more irrelevant renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This series depends on ort-perf-batch-9.

=== Basic Optimization idea ===

This series adds additional special cases where detection of renames is
irrelevant, where the irrelevance is due to the fact that the merge
machinery will arrive at the same result regardless of whether a rename is
detected for any of those paths. That high level wording makes it sound the
same as ort-perf-batch-9, and basically it is, it's just trying to take the
optimization a step further.

As noted in the last series, there are two reasons that the merge machinery
needs renames:

 * in order to do three-way content merging (pairing appropriate files)
 * in order to find where directories have been renamed

ort-perf-batch-9 provided a rough approximation for the second criteria that
was good enough, but which still left us detecting more renames than
necessary. This series focuses further on that criteria and finds ways to
avoid the need to detect as many renames while still detecting directory
renames identically to before. Thus, this series is an improvement on
"Optimization #2" from my Git Merge 2020 talk[1].

=== Results ===

For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance
work; instrument with trace2_region_* calls", 2020-10-28), the changes in
just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.680 s ±  0.096 s     5.665 s ±  0.129 s 
mega-renames:     13.812 s ±  0.162 s    11.435 s ±  0.158 s
just-one-mega:   506.0  ms ±  3.9  ms   494.2  ms ±  6.1  ms


While those results may look somewhat meager, it is important to note that
the previous optimizations have already reduced rename detection time to
nearly 0 for these particular testcases so there just isn't much left to
improve. The final patch in the series shows an alternate testcase where the
previous optimizations aren't as effective (a simple cherry-pick of a commit
that simply adds one new empty file), where there was a speedup factor of
approximately 3 due to this series:

                     Before Series           After Series
pick-empty:        1.936 s ±  0.024 s     688.1 ms ±  4.2 ms


There was also another testcase at $DAYJOB where I saw a factor 7
improvement from this particular optimization, so it certainly has the
potential to help when the previous optimizations are not quite enough.

As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with (as noted in the same commit message)
were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


[1]
https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf

Elijah Newren (8):
  diffcore-rename: take advantage of "majority rules" to skip more
    renames
  merge-ort, diffcore-rename: tweak dirs_removed and relevant_source
    type
  merge-ort: record the reason that we want a rename for a directory
  diffcore-rename: only compute dir_rename_count for relevant
    directories
  diffcore-rename: check if we have enough renames for directories early
    on
  diffcore-rename: add computation of number of unknown renames
  merge-ort: record the reason that we want a rename for a file
  diffcore-rename: determine which relevant_sources are no longer
    relevant

 diffcore-rename.c | 230 ++++++++++++++++++++++++++++++++++++++++------
 diffcore.h        |  19 +++-
 merge-ort.c       |  79 ++++++++++++----
 3 files changed, 281 insertions(+), 47 deletions(-)


base-commit: 98b0c7de5e70d62d47c3eeb3d290c6a234214f40
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-853%2Fnewren%2Fort-perf-batch-10-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-853/newren/ort-perf-batch-10-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/853
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux