On Sat, Feb 27, 2021 at 7:58 PM Elijah Newren via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > This series depends textually on ort-perf-batch-8, but semantically it's > almost completely unrelated and can be reviewed without any familiarity with > any of the previous patch series. > > === Basic Optimization idea === > > This series determines paths which meet special cases where detection of > renames is irrelevant, where the irrelevance is due to the fact that the > merge machinery will arrive at the same result regardless of whether a > rename is detected for any of those paths. This series represents > "Optimization #2" from my Git Merge 2020 talk[1], though this series has > some improvements on the optimization relative to what I had at that time. > > The basic idea here is that if side A of history: > > * only modifies/adds/deletes a few files > * adds new files to few if any of the directories that side B deleted or > renamed > > then when we do rename detection on side B we can avoid even looking at most > (and perhaps even all) paths that side B deleted. Since commits being > rebased or cherry-picked tend to only modify a few files, this optimization > tends to be particularly effective for rebases and cherry-picks. > > Basing rename detection on what the other side of history did to a file > means that extra information needs to be fed from merge-ort to > diffcore-rename in order to take advantage of such an optimization. > > === Comparison to previous series === > > This series differs from my two previous optimizations[2][3] (focusing on > basename-guided rename detection) in two important aspects: > > * there are no behavioral changes (there is no heuristic involved) > > * this optimization is merge specific (it does not help the diff/status/log > family of commands, just merge/rebase/cherry-pick and such) > > === Results === > > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin > performance work; instrument with trace2_region_* calls", 2020-10-28), the > changes in just this series improves the performance as follows: > > Before Series After Series > no-renames: 12.596 s ± 0.061 s 5.680 s ± 0.096 s > mega-renames: 130.465 s ± 0.259 s 13.812 s ± 0.162 s > just-one-mega: 3.958 s ± 0.010 s 506.0 ms ± 3.9 ms > > > However, interestingly, if we had ignored the basename-guided rename > detection optimizations[2][3], then this optimization series would have > improved the performance as follows: > > Before Basename Series After Just This Series > no-renames: 13.815 s ± 0.062 s 5.728 s ± 0.104 s > mega-renames: 1799.937 s ± 0.493 s 18.213 s ± 0.139 s > just-one-mega 51.289 s ± 0.019 s 891.9 ms ± 7.0 ms > > > As a reminder, before any merge-ort/diffcore-rename performance work, the > performance results we started with (as noted in the same commit message) > were: > > no-renames-am: 6.940 s ± 0.485 s > no-renames: 18.912 s ± 0.174 s > mega-renames: 5964.031 s ± 10.459 s > just-one-mega: 149.583 s ± 0.751 s > > > === Competition between optimizations === > > We now have three major rename-related optimizations: > > * exact rename detection > * basename-guided rename detection[2][3] > * skip-because-unnecessary (this series) > > It is possible for all three to potentially apply for specific paths (they > do for the majority of renamed paths in our testcases), but we cannot use > more than one for any given path. It turns out that the priority we give > each optimization is very important and can drastically affect performance. > We get best results by prioritizing them as follows: > > 1. exact rename detection > 2. skip-because-unnecessary > 3. basename-guided rename detection > > The third-to-last patch of this series also discusses this ordering and > another minor variant of the skip-because-unnecessary optimization that was > tried (and resulted in less effective performance gains than reported here), > as well as some of the preparatory work over the past few years that this > series relies on in order to enable this optimization. Oops. When I restructured the series I carefully re-read the commit messages to make sure I didn't forget to update one...but I apparently forgot to update the cover letter. The discussion was actually split across a few patches by the refactoring, and what is now the third-to-last patch doesn't contain any of that discussion. > === Near optimal? === > > You may remember that there was a row labelled "everything else" from the > commit message of 557ac0350d that represented the maximum possible speed-up > from accelerating rename detection alone; as stated in that commit, those > rows represented how fast the code could be if we had somehow infinitely > parallelized the inexact rename detection. However, if you compare those > "maximum speedup" numbers to what we have above, you'll note that the > infinitely parallelized inexact rename detection would have been slightly > slower than the results we have now achieved. (The reason this is possible, > despite the fact that we still spend time in rename detection after our > optimizations, is because we implemented two optimizations outside of > diffcore_rename() along the way.) However, this good news does also come > with a downside -- it means that our remaining optimization potential is > somewhat limited, and subsequent optimization series will have to fight for > much smaller gains. > > [1] > https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf > [2] > https://lore.kernel.org/git/pull.843.git.1612651937.gitgitgadget@xxxxxxxxx/ > [3] > https://lore.kernel.org/git/pull.844.git.1613289544.gitgitgadget@xxxxxxxxx/ > > Elijah Newren (8): > diffcore-rename: enable filtering possible rename sources > merge-ort: precompute subset of sources for which we need rename > detection > merge-ort: add data structures for an alternate tree traversal > merge-ort: introduce wrappers for alternate tree traversal > merge-ort: precompute whether directory rename detection is needed > merge-ort: use relevant_sources to filter possible rename sources > merge-ort: skip rename detection entirely if possible > diffcore-rename: avoid doing basename comparisons for irrelevant > sources > > diffcore-rename.c | 63 ++++++++++--- > diffcore.h | 1 + > merge-ort.c | 236 +++++++++++++++++++++++++++++++++++++++++++++- > 3 files changed, 285 insertions(+), 15 deletions(-) > > > base-commit: 4be565c472088d4144063b736308bf2a57331f45 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-845%2Fnewren%2Fort-perf-batch-9-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-845/newren/ort-perf-batch-9-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/845 > -- > gitgitgadget