This depends on a merge of en/ort-conflict-handling, en/diffcore-rename, and en/ort-directory-rename. Changes since v3: * Add a couple preliminary patches needed for later performance work, including fixing a big memory leak (in some series that has already been merged to master, I should have included a small additional section of code from my 'ort' branch, but I overlooked it). * Update the performance numbers in the commit message of the final patch based on the changes; the memory leak makes a noticeable difference on the overall timings since it basically represents a combination of all the data allocated during the merge algorithm. Elijah Newren (3): merge-ort: fix massive leak merge-ort: ignore the directory rename split conflict for now merge-ort: begin performance work; instrument with trace2_region_* calls diffcore-rename.c | 8 +++++ merge-ort.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 94 insertions(+), 1 deletion(-) Range-diff: -: ---------- > 1: 549a63cd2a merge-ort: fix massive leak -: ---------- > 2: 817a197dbc merge-ort: ignore the directory rename split conflict for now 1: 36d1f87d05 ! 3: 7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls @@ Commit message Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames, 10 runs for the other two cases): - merge-recursive merge-ort - no-renames: 18.912 s ± 0.174 s 12.975 s ± 0.037 s - mega-renames: 5964.031 s ± 10.459 s 5154.338 s ± 19.139 s - just-one-mega: 149.583 s ± 0.751 s 146.703 s ± 0.852 s + merge-recursive merge-ort + no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s + mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s + just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s A single re-run of each with some breakdowns: - --- no-renames --- - merge-recursive merge-ort - overall runtime: 19.302 s 13.017 s - inexact rename detection: 7.603 s 7.695 s - everything else: 11.699 s 5.322 s + --- no-renames --- + merge-recursive merge-ort + overall runtime: 19.302 s 14.257 s + inexact rename detection: 7.603 s 7.906 s + everything else: 11.699 s 6.351 s - --- mega-renames --- - merge-recursive merge-ort - overall runtime: 5950.195 s 5132.851 s - inexact rename detection: 5746.309 s 5119.215 s - everything else: 203.886 s 13.636 s + --- mega-renames --- + merge-recursive merge-ort + overall runtime: 5950.195 s 5499.672 s + inexact rename detection: 5746.309 s 5487.120 s + everything else: 203.886 s 17.552 s - --- just-one-mega --- - merge-recursive merge-ort - overall runtime: 151.001 s 146.478 s - inexact rename detection: 143.448 s 145.901 s - everything else: 7.553 s 0.577 s + --- just-one-mega --- + merge-recursive merge-ort + overall runtime: 151.001 s 158.582 s + inexact rename detection: 143.448 s 157.835 s + everything else: 7.553 s 0.747 s === Timing observations === + 0) Maximum speedup + + The "everything else" row represents the maximum speedup we could + achieve if we were to somehow infinitely parallelize inexact rename + detection, but leave everything else alone. The fact that this is so + much smaller than the real runtime (even in the case with virtually no + renames) makes it clear just how overwhelmingly large the time spent on + rename detection can be. + 1) no-renames 1a) merge-ort is faster than merge-recursive, which is nice. However, @@ Commit message 2a) Obviously rename detection is a huge cost; it's where most the time is spent. We need to cut that down. If we could somehow infinitely parallelize it and drive its time to 0, the merge-recursive time would - drop to about 204s, and the merge-ort time would drop to about 14s. I + drop to about 204s, and the merge-ort time would drop to about 17s. I think this particular stat shows I've subtly baked a couple performance - improvements into merge-ort[A] (one of them large) and into - fast-rebase[B] already. - - [A] Avoid quadratic behavior with O(N) insertions or removals - of entries in the index & avoid unconditional dropping and - re-reading of the index - [B] Avoid updating the on-disk index or the working directory - for intermediate patches -- only update at the end - - 2b) rename-detection is somehow ~10% cheaper for merge-ort than - merge-recursive. This was and is a big surprise to me. Both of them - call diff_tree_oid() and diffcore_std() with the EXACT same inputs. I - don't have an explanation, but it is very consistent even after - re-running many times. Interestingly, the rename detection for the - first patch is more expensive (just barely) for merge-ort than - merge-recursive, and that is also consistent. I won't investigate this - further, as I'm just going to focus on 1a & 2a. + improvements into merge-ort and into fast-rebase already. 3) just-one-mega @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res assert(opt->branch1 && opt->branch2); @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result) - assert(opt->obuf.len == 0); - - assert(opt->priv == NULL); + assert(!opt->priv->toplevel_dir || + 0 == strlen(opt->priv->toplevel_dir)); + } + trace2_region_leave("merge", "sanity checks", opt->repo); /* Default to histogram diff. Actually, just hardcode it...for now. */ @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res /* Initialization of opt->priv, our internal merge data */ + trace2_region_enter("merge", "allocate/init", opt->repo); - opt->priv = xcalloc(1, sizeof(*opt->priv)); - - /* Initialization of various renames fields */ + if (opt->priv) { + clear_or_reinit_internal_opts(opt->priv, 1); + trace2_region_leave("merge", "allocate/init", opt->repo); @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result) * subset of the overall paths that have special output. */ -- 2.30.0.135.g7f7d4a3e17