This series depends on ort-perf-batch-9. === Basic Optimization idea === This series adds additional special cases where detection of renames is irrelevant, where the irrelevance is due to the fact that the merge machinery will arrive at the same result regardless of whether a rename is detected for any of those paths. That high level wording makes it sound the same as ort-perf-batch-9, and basically it is, it's just trying to take the optimization a step further. As noted in the last series, there are two reasons that the merge machinery needs renames: * in order to do three-way content merging (pairing appropriate files) * in order to find where directories have been renamed ort-perf-batch-9 provided a rough approximation for the second criteria that was good enough, but which still left us detecting more renames than necessary. This series focuses further on that criteria and finds ways to avoid the need to detect as many renames while still detecting directory renames identically to before. Thus, this series is an improvement on "Optimization #2" from my Git Merge 2020 talk[1]. === Results === For the testcases mentioned in commit 557ac03 ("merge-ort: begin performance work; instrument with trace2_region_* calls", 2020-10-28), the changes in just this series improves the performance as follows: Before Series After Series no-renames: 5.680 s ± 0.096 s 5.665 s ± 0.129 s mega-renames: 13.812 s ± 0.162 s 11.435 s ± 0.158 s just-one-mega: 506.0 ms ± 3.9 ms 494.2 ms ± 6.1 ms While those results may look somewhat meager, it is important to note that the previous optimizations have already reduced rename detection time to nearly 0 for these particular testcases so there just isn't much left to improve. The final patch in the series shows an alternate testcase where the previous optimizations aren't as effective (a simple cherry-pick of a commit that simply adds one new empty file), where there was a speedup factor of approximately 3 due to this series: Before Series After Series pick-empty: 1.936 s ± 0.024 s 688.1 ms ± 4.2 ms There was also another testcase at $DAYJOB where I saw a factor 7 improvement from this particular optimization, so it certainly has the potential to help when the previous optimizations are not quite enough. As a reminder, before any merge-ort/diffcore-rename performance work, the performance results we started with (as noted in the same commit message) were: no-renames-am: 6.940 s ± 0.485 s no-renames: 18.912 s ± 0.174 s mega-renames: 5964.031 s ± 10.459 s just-one-mega: 149.583 s ± 0.751 s [1] https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf Elijah Newren (8): diffcore-rename: take advantage of "majority rules" to skip more renames merge-ort, diffcore-rename: tweak dirs_removed and relevant_source type merge-ort: record the reason that we want a rename for a directory diffcore-rename: only compute dir_rename_count for relevant directories diffcore-rename: check if we have enough renames for directories early on diffcore-rename: add computation of number of unknown renames merge-ort: record the reason that we want a rename for a file diffcore-rename: determine which relevant_sources are no longer relevant diffcore-rename.c | 230 ++++++++++++++++++++++++++++++++++++++++------ diffcore.h | 19 +++- merge-ort.c | 79 ++++++++++++---- 3 files changed, 281 insertions(+), 47 deletions(-) base-commit: 98b0c7de5e70d62d47c3eeb3d290c6a234214f40 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-853%2Fnewren%2Fort-perf-batch-10-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-853/newren/ort-perf-batch-10-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/853 -- gitgitgadget