[PATCH v4 0/7] Optimization batch 14: trivial directory resolution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This series depends textually on ort-perf-batch-12, but is semantically
independent. (It is both semantically and textually independent of
ort-perf-batch-13.)

Most of my previous series dramatically accelerated cases with lots of
renames, while providing comparatively minor benefits for cases with few or
no renames. This series is the opposite; it provides huge benefits when
there are few or no renames, and comparatively smaller (though still quite
decent) benefits for cases with many uncached renames.

Changes since v3:

 * Add Stolee's Acked-by.

Changes since v2, addressing feedback from Stolee:

 * Created a separate struct for three related variables to hint they are
   related
 * Simplified a lengthy comment that was duplicated by the commit message
 * Various other minor cleanups

Changes since v1:

 * Minor tweak to the final patch to correct implicit assumption that rename
   detection running implies all renames were found (rename limits could
   have been exceeded and prevented finding renames)

=== Basic Optimization idea ===

unpack_trees has had a concept of trivial merges for individual files (see
Documentation/technical/trivial-merge.txt). The same idea can be applied in
merge-ort. It'd be really nice to extend that idea to trees as well, as it
could provide a huge performance boost; sadly however, applying it in
general would wreck both regular rename detection (the unmatched side can
have new files that serve as potential destinations in rename detection) and
directory rename detection (the unmatched side could have a new directory
that was moved into it).

If we somehow knew rename detection wasn't needed, we could do trivial
directory resolution. In the past, this wasn't possible. However...

With recent optimizations we have created a possibility to do trivial
directory resolutions in some cases. These came from the addition of the
"skipping irrelevant renames" optimizations (from ort-perf-batch-9 and
ort-perf-batch-10), and in particular noting that we added an ability to
entirely skip rename detection in commit f89b4f2bee ("merge-ort: skip rename
detection entirely if possible", 2021-03-11) when there are no relevant
sources. We can detect if there are no relevant sources without recursing
into the directories in question.

As a cherry on top, the caching of renames (from ort-perf-batch-11) allows
us to cover additional cases.

This series is all about adding all the special checks needed to safely
perform trival directory resolutions.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.235 s ±  0.042 s   204.2  ms ±  3.0  ms
mega-renames:      9.419 s ±  0.107 s     1.076 s ±  0.015 s
just-one-mega:   480.1  ms ±  3.9  ms   364.1  ms ±  7.0  ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with (for merge-recursive as of git-2.30.0)
were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


Elijah Newren (7):
  merge-ort: resolve paths early when we have sufficient information
  merge-ort: add some more explanations in collect_merge_info_callback()
  merge-ort: add data structures for allowable trivial directory
    resolves
  merge-ort: add a handle_deferred_entries() helper function
  merge-ort: defer recursing into directories when merge base is matched
  merge-ort: avoid recursing into directories when we don't need to
  merge-ort: restart merge with cached renames to reduce process entry
    cost

 merge-ort.c                         | 399 +++++++++++++++++++++++++++-
 t/t6423-merge-rename-directories.sh |   2 +-
 2 files changed, 389 insertions(+), 12 deletions(-)


base-commit: 2eeee12b02e441ac05054a5a5ecbcea6964a1e6b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-988%2Fnewren%2Fort-perf-batch-14-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-988/newren/ort-perf-batch-14-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/988

Range-diff vs v3:

 1:  5dca982c0b0 ! 1:  7fdfeb159d0 merge-ort: resolve paths early when we have sufficient information
     @@ Commit message
          range of noise.  However, this idea serves as the beginning of some
          bigger optimizations coming in the following patches.
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 2:  8aea3713902 ! 2:  7a0085f2da9 merge-ort: add some more explanations in collect_merge_info_callback()
     @@ Commit message
          better.  While we're at it, add a few comments to denote what a few
          boolean '0' or '1' values stand for.
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 3:  c2b45fef1d7 ! 3:  d8165740316 merge-ort: add data structures for allowable trivial directory resolves
     @@ Commit message
          Add some data structures that we will use to do these deferrals, with
          some lengthy comments explaining their purpose.
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 4:  1cf4a47562a ! 4:  ff568a612f5 merge-ort: add a handle_deferred_entries() helper function
     @@ Commit message
          but a subsequent commit will change that.  Future commits will also make
          the function sometimes resolve directories instead of traversing inside.
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 5:  79c51536829 ! 5:  5b01c118f10 merge-ort: defer recursing into directories when merge base is matched
     @@ Commit message
          subdirectories.  The extra conditions are there for such deferred cases
          and will be used more as we do more with those variables.
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 6:  572cc5e94d2 ! 6:  7b211271815 merge-ort: avoid recursing into directories when we don't need to
     @@ Commit message
              mega-renames:      9.419 s ±  0.107 s     1.564 s ±  0.010 s
              just-one-mega:   480.1  ms ±  3.9  ms   479.5  ms ±  3.9  ms
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##
 7:  a9cbc1d4f18 ! 7:  c9ada8369e6 merge-ort: restart merge with cached renames to reduce process entry cost
     @@ Commit message
              mega-renames:      1.564 s ±  0.010 s     1.076 s ±  0.015 s
              just-one-mega:   479.5  ms ±  3.9  ms   364.1  ms ±  7.0  ms
      
     +    Acked-by: Derrick Stolee <stolee@xxxxxxxxx>
          Signed-off-by: Elijah Newren <newren@xxxxxxxxx>
      
       ## merge-ort.c ##

-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux