Re: [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible

Derrick Stolee <stolee@xxxxxxxxx> · Sat, 22 May 2021 07:17:01 -0400

On 5/19/21 8:36 PM, Elijah Newren wrote:
> On Mon, May 17, 2021 at 7:23 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:>> As I was reading, I was also thinking that it would be good to
>> have some kind of tracing, if only a summary of how often we
>> relied upon the cached renames.
> 
> That's an interesting thought.  If there are renames that remain
> cached, the code always uses them (whether we need the cached rename
> or not doesn't matter since we already have it and it is cheap to
> use), so a summary would probably be the only thing that would make
> sense.
> 
> The easiest summary of how often we rely upon cached renames, is
> checking if we have enough to avoid calling diffcore_rename_extended()
> entirely.  That's precisely what the code above does that you are
> responding to.
> 
> What more information could we get out?  I guess we could get ever so
> slightly more information by tracking how many times we decide that
> the cache from a previous merge can be re-used (by tracking the number
> of times that cached_valid_side is set to 1 or 2 instead of 0).  Since
> we sometimes have some cached renames but not enough to skip rename
> detection (because source paths that were irrelevant in previous
> commits are marked relevant for the current commit), this would be an
> independent number from the region_enter-diffcore_rename count used
> above.
> 
>> That would present a mechanism
>> for the test cases to verify that the rename cache is behaving
>> as expected
> 
> How so?  I did check something like that above with verifying I was
> able to reduce the number of calls to diffcore_rename_extended() while
> still getting the expected result (which can only be done if renames
> are known).  The only additional thing I can think of we could check
> would be a testcase where renames are cached and can be re-used but
> it's not enough to avoid calling diffcore_rename_extended().  Did you
> have something else in mind?
> 
>> but also provides a way to diagnose any issues that
>> might arise in the future by asking a user to reproduce a
>> problematic rebase/merge with a GIT_TRACE2* target. Such a
>> change would fit as a follow-up, and does not need to insert
>> into an already heavy patch.
> 
> I'm not sure what information could be recorded that would help
> diagnose any such issues.  "9 out of 10 commits in your rebase reused
> cached renames" just doesn't seem granular enough to help.  Is there
> something else you were thinking of recording?

My initial thought was to include basic summary statistics, such
as "number of cached renames used" and "number of new renames" and
"number of invalidated cached renames" or something, summarized per
commit in the list. The information might not be a clear way to
find a root cause to a strange rebase, but it would help someone
looking for the root cause to know whether this rename cache is
involved or not.

As for the use in the test cases, we might be able to see the list
of trace2 results and extract the number of each type of statistic,
matching them to our expectations. That might make the tests too
fragile to future changes, though.

I'm commenting also on your v3 that we don't need to pursue the
trace2 idea right now, and can do it if we find it necessary to
diagnose any issues with the feature.

Thanks,
-Stolee