[PATCH v4 0/3] And so it begins...merge/rename performance work

Elijah Newren <newren@xxxxxxxxx> · Sat, 23 Jan 2021 22:01:09 -0800

This depends on a merge of en/ort-conflict-handling, en/diffcore-rename,
and en/ort-directory-rename.

Changes since v3:
  * Add a couple preliminary patches needed for later performance work,
    including fixing a big memory leak (in some series that has already
    been merged to master, I should have included a small additional
    section of code from my 'ort' branch, but I overlooked it).
  * Update the performance numbers in the commit message of the final
    patch based on the changes; the memory leak makes a noticeable
    difference on the overall timings since it basically represents
    a combination of all the data allocated during the merge algorithm.

Elijah Newren (3):
  merge-ort: fix massive leak
  merge-ort: ignore the directory rename split conflict for now
  merge-ort: begin performance work; instrument with trace2_region_*
    calls

 diffcore-rename.c |  8 +++++
 merge-ort.c       | 87 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 94 insertions(+), 1 deletion(-)

Range-diff:
-:  ---------- > 1:  549a63cd2a merge-ort: fix massive leak
-:  ---------- > 2:  817a197dbc merge-ort: ignore the directory rename split conflict for now
1:  36d1f87d05 ! 3:  7f7d4a3e17 merge-ort: begin performance work; instrument with trace2_region_* calls
    @@ Commit message
         Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames,
         10 runs for the other two cases):

    -                      merge-recursive         merge-ort
    -      no-renames:        18.912 s ±  0.174 s    12.975 s ±  0.037 s
    -      mega-renames:    5964.031 s ± 10.459 s  5154.338 s ± 19.139 s
    -      just-one-mega:    149.583 s ±  0.751 s   146.703 s ±  0.852 s
    +                           merge-recursive           merge-ort
    +        no-renames:       18.912 s ±  0.174 s    14.263 s ±  0.053 s
    +        mega-renames:   5964.031 s ± 10.459 s  5504.231 s ±  5.150 s
    +        just-one-mega:   149.583 s ±  0.751 s   158.534 s ±  0.498 s

         A single re-run of each with some breakdowns:

    -                                      ---  no-renames  ---
    -                                merge-recursive   merge-ort
    -      overall runtime:              19.302 s        13.017 s
    -      inexact rename detection:      7.603 s         7.695 s
    -      everything else:              11.699 s         5.322 s
    +                                        ---  no-renames  ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:              19.302 s        14.257 s
    +        inexact rename detection:      7.603 s         7.906 s
    +        everything else:              11.699 s         6.351 s

    -                                      --- mega-renames ---
    -                                merge-recursive   merge-ort
    -      overall runtime:            5950.195 s      5132.851 s
    -      inexact rename detection:   5746.309 s      5119.215 s
    -      everything else:             203.886 s        13.636 s
    +                                        --- mega-renames ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:            5950.195 s      5499.672 s
    +        inexact rename detection:   5746.309 s      5487.120 s
    +        everything else:             203.886 s        17.552 s

    -                                      --- just-one-mega ---
    -                                merge-recursive   merge-ort
    -      overall runtime:             151.001 s       146.478 s
    -      inexact rename detection:    143.448 s       145.901 s
    -      everything else:               7.553 s         0.577 s
    +                                        --- just-one-mega ---
    +                                  merge-recursive   merge-ort
    +        overall runtime:             151.001 s       158.582 s
    +        inexact rename detection:    143.448 s       157.835 s
    +        everything else:               7.553 s         0.747 s

         === Timing observations ===

    +    0) Maximum speedup
    +
    +    The "everything else" row represents the maximum speedup we could
    +    achieve if we were to somehow infinitely parallelize inexact rename
    +    detection, but leave everything else alone.  The fact that this is so
    +    much smaller than the real runtime (even in the case with virtually no
    +    renames) makes it clear just how overwhelmingly large the time spent on
    +    rename detection can be.
    +
         1) no-renames

         1a) merge-ort is faster than merge-recursive, which is nice.  However,
    @@ Commit message
         2a) Obviously rename detection is a huge cost; it's where most the time
         is spent.  We need to cut that down.  If we could somehow infinitely
         parallelize it and drive its time to 0, the merge-recursive time would
    -    drop to about 204s, and the merge-ort time would drop to about 14s.  I
    +    drop to about 204s, and the merge-ort time would drop to about 17s.  I
         think this particular stat shows I've subtly baked a couple performance
    -    improvements into merge-ort[A] (one of them large) and into
    -    fast-rebase[B] already.
    -
    -        [A] Avoid quadratic behavior with O(N) insertions or removals
    -            of entries in the index & avoid unconditional dropping and
    -            re-reading of the index
    -        [B] Avoid updating the on-disk index or the working directory
    -            for intermediate patches -- only update at the end
    -
    -    2b) rename-detection is somehow ~10% cheaper for merge-ort than
    -    merge-recursive.  This was and is a big surprise to me.  Both of them
    -    call diff_tree_oid() and diffcore_std() with the EXACT same inputs.  I
    -    don't have an explanation, but it is very consistent even after
    -    re-running many times.  Interestingly, the rename detection for the
    -    first patch is more expensive (just barely) for merge-ort than
    -    merge-recursive, and that is also consistent.  I won't investigate this
    -    further, as I'm just going to focus on 1a & 2a.
    +    improvements into merge-ort and into fast-rebase already.

         3) just-one-mega

    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res

      	assert(opt->branch1 && opt->branch2);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
    - 	assert(opt->obuf.len == 0);
    - 
    - 	assert(opt->priv == NULL);
    + 		assert(!opt->priv->toplevel_dir ||
    + 		       0 == strlen(opt->priv->toplevel_dir));
    + 	}
     +	trace2_region_leave("merge", "sanity checks", opt->repo);

      	/* Default to histogram diff.  Actually, just hardcode it...for now. */
    @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_res

      	/* Initialization of opt->priv, our internal merge data */
     +	trace2_region_enter("merge", "allocate/init", opt->repo);
    - 	opt->priv = xcalloc(1, sizeof(*opt->priv));
    - 
    - 	/* Initialization of various renames fields */
    + 	if (opt->priv) {
    + 		clear_or_reinit_internal_opts(opt->priv, 1);
    + 		trace2_region_leave("merge", "allocate/init", opt->repo);
     @@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
      	 * subset of the overall paths that have special output.
      	 */
-- 
2.30.0.135.g7f7d4a3e17