Re: [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Feb 27, 2021 at 7:58 PM Elijah Newren via GitGitGadget
<gitgitgadget@xxxxxxxxx> wrote:
>
> This series depends textually on ort-perf-batch-8, but semantically it's
> almost completely unrelated and can be reviewed without any familiarity with
> any of the previous patch series.
>
> === Basic Optimization idea ===
>
> This series determines paths which meet special cases where detection of
> renames is irrelevant, where the irrelevance is due to the fact that the
> merge machinery will arrive at the same result regardless of whether a
> rename is detected for any of those paths. This series represents
> "Optimization #2" from my Git Merge 2020 talk[1], though this series has
> some improvements on the optimization relative to what I had at that time.
>
> The basic idea here is that if side A of history:
>
>  * only modifies/adds/deletes a few files
>  * adds new files to few if any of the directories that side B deleted or
>    renamed
>
> then when we do rename detection on side B we can avoid even looking at most
> (and perhaps even all) paths that side B deleted. Since commits being
> rebased or cherry-picked tend to only modify a few files, this optimization
> tends to be particularly effective for rebases and cherry-picks.
>
> Basing rename detection on what the other side of history did to a file
> means that extra information needs to be fed from merge-ort to
> diffcore-rename in order to take advantage of such an optimization.
>
> === Comparison to previous series ===
>
> This series differs from my two previous optimizations[2][3] (focusing on
> basename-guided rename detection) in two important aspects:
>
>  * there are no behavioral changes (there is no heuristic involved)
>
>  * this optimization is merge specific (it does not help the diff/status/log
>    family of commands, just merge/rebase/cherry-pick and such)
>
> === Results ===
>
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
>
>                      Before Series           After Series
> no-renames:       12.596 s ±  0.061 s     5.680 s ±  0.096 s
> mega-renames:    130.465 s ±  0.259 s    13.812 s ±  0.162 s
> just-one-mega:     3.958 s ±  0.010 s   506.0  ms ±  3.9  ms
>
>
> However, interestingly, if we had ignored the basename-guided rename
> detection optimizations[2][3], then this optimization series would have
> improved the performance as follows:
>
>                Before Basename Series   After Just This Series
> no-renames:      13.815 s ±  0.062 s      5.728 s ±  0.104 s
> mega-renames:  1799.937 s ±  0.493 s     18.213 s ±  0.139 s
> just-one-mega    51.289 s ±  0.019 s    891.9  ms ±  7.0  ms
>
>
> As a reminder, before any merge-ort/diffcore-rename performance work, the
> performance results we started with (as noted in the same commit message)
> were:
>
> no-renames-am:      6.940 s ±  0.485 s
> no-renames:        18.912 s ±  0.174 s
> mega-renames:    5964.031 s ± 10.459 s
> just-one-mega:    149.583 s ±  0.751 s
>
>
> === Competition between optimizations ===
>
> We now have three major rename-related optimizations:
>
>  * exact rename detection
>  * basename-guided rename detection[2][3]
>  * skip-because-unnecessary (this series)
>
> It is possible for all three to potentially apply for specific paths (they
> do for the majority of renamed paths in our testcases), but we cannot use
> more than one for any given path. It turns out that the priority we give
> each optimization is very important and can drastically affect performance.
> We get best results by prioritizing them as follows:
>
>  1. exact rename detection
>  2. skip-because-unnecessary
>  3. basename-guided rename detection
>
> The third-to-last patch of this series also discusses this ordering and
> another minor variant of the skip-because-unnecessary optimization that was
> tried (and resulted in less effective performance gains than reported here),
> as well as some of the preparatory work over the past few years that this
> series relies on in order to enable this optimization.

Oops.  When I restructured the series I carefully re-read the commit
messages to make sure I didn't forget to update one...but I apparently
forgot to update the cover letter.  The discussion was actually split
across a few patches by the refactoring, and what is now the
third-to-last patch doesn't contain any of that discussion.

> === Near optimal? ===
>
> You may remember that there was a row labelled "everything else" from the
> commit message of 557ac0350d that represented the maximum possible speed-up
> from accelerating rename detection alone; as stated in that commit, those
> rows represented how fast the code could be if we had somehow infinitely
> parallelized the inexact rename detection. However, if you compare those
> "maximum speedup" numbers to what we have above, you'll note that the
> infinitely parallelized inexact rename detection would have been slightly
> slower than the results we have now achieved. (The reason this is possible,
> despite the fact that we still spend time in rename detection after our
> optimizations, is because we implemented two optimizations outside of
> diffcore_rename() along the way.) However, this good news does also come
> with a downside -- it means that our remaining optimization potential is
> somewhat limited, and subsequent optimization series will have to fight for
> much smaller gains.
>
> [1]
> https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf
> [2]
> https://lore.kernel.org/git/pull.843.git.1612651937.gitgitgadget@xxxxxxxxx/
> [3]
> https://lore.kernel.org/git/pull.844.git.1613289544.gitgitgadget@xxxxxxxxx/
>
> Elijah Newren (8):
>   diffcore-rename: enable filtering possible rename sources
>   merge-ort: precompute subset of sources for which we need rename
>     detection
>   merge-ort: add data structures for an alternate tree traversal
>   merge-ort: introduce wrappers for alternate tree traversal
>   merge-ort: precompute whether directory rename detection is needed
>   merge-ort: use relevant_sources to filter possible rename sources
>   merge-ort: skip rename detection entirely if possible
>   diffcore-rename: avoid doing basename comparisons for irrelevant
>     sources
>
>  diffcore-rename.c |  63 ++++++++++---
>  diffcore.h        |   1 +
>  merge-ort.c       | 236 +++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 285 insertions(+), 15 deletions(-)
>
>
> base-commit: 4be565c472088d4144063b736308bf2a57331f45
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-845%2Fnewren%2Fort-perf-batch-9-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-845/newren/ort-perf-batch-9-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/845
> --
> gitgitgadget




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux