Re: [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames

Derrick Stolee <stolee@xxxxxxxxx> · Tue, 9 Mar 2021 17:08:37 -0500

On 3/8/2021 7:09 PM, Elijah Newren via GitGitGadget wrote:> The basic idea here is:
> 
> We only need expensive rename detection on the subset of files changed on
> both sides of history (for the most part).
> 
> This is because:
> 
>  1. The primary reason for rename detection in merges is enabling three-way
>     content merges
>  2. The purpose of three-way content merges is reconciling changes when
> 
> both sides of history modified some file 3. If a file was only modified by
> the side that renamed the file, then detecting the rename is irrelevant;
> we'll get the same answer without knowing about the rename. 4. (Well...there
> are rare cases where we need the rename for reasons other than three-way
> content merges. Patch 5 explains those.)

Makes sense. Don't compute information you won't need. I look forward to
trying to figure out the special cases here.

>                      Before Series           After Series
> no-renames:       12.596 s ±  0.061 s     5.680 s ±  0.096 s
> mega-renames:    130.465 s ±  0.259 s    13.812 s ±  0.162 s
> just-one-mega:     3.958 s ±  0.010 s   506.0  ms ±  3.9  ms

These are _very_ impressive numbers for such a "simple" idea.

> However, interestingly, if we had ignored the basename-guided rename
> detection optimizations[2][3], then this optimization series would have
> improved the performance as follows:
> 
>                Before Basename Series   After Just This Series
> no-renames:      13.815 s ±  0.062 s      5.728 s ±  0.104 s
> mega-renames:  1799.937 s ±  0.493 s     18.213 s ±  0.139 s
> just-one-mega    51.289 s ±  0.019 s    891.9  ms ±  7.0  ms

And here it is even more impressive. I see that your optimizations are
having combined effects but also are doing valuable things on their
own.

> We get best results by prioritizing them as follows:
> 
>  1. exact rename detection
>  2. skip-because-unnecessary
>  3. basename-guided rename detection

This makes sense to me, since even the basename-guided rename is
doing some non-trivial work. It would be good to reduce that
effort.

> it means that our remaining optimization potential is
> somewhat limited, and subsequent optimization series will have to fight for
> much smaller gains.

This is a good place to end up. Let the code rest for a bit after
we are done here, and maybe we'll find new cases to care about
later. We could chase the long tail forever, but these steps are
a huge accomplishment!

Getting to reading now.

-Stolee