On Sat, Feb 13, 2021 at 3:56 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Elijah Newren <newren@xxxxxxxxx> writes: > > > This is not true. If src/main.c is 99% similar to src/foo.c, and is > > 0% similar to the src/main.c in the new commit, we match the old > > src/main.c to the new src/main.c despite being far more similar > > src/foo.c. Unless break detection is turned on, we do not allow > > content similarity to trump (full) filename equality. > > Absolutely. And we are talking about a new optimization that kicks > in only when there is no break or no copy detection going on, no? Yes, precisely, we are only considering cases without break detection...and thus we are considering cases where for the last 15 years or more, sufficiently large filename similarity (an exact fullname match) trumps any level of content similarity. I think it is useful to note that while my optimization is adding more considerations that can overrule maximal content similarity, it is not the first such code choice to do that. But let me back up a bit... When I submitted the series, you and Stolee went into a long discussion about an optimization that I didn't submit, one that feels looser on "matching" than anything I submitted, and which I think might counter-intuitively reduce performance rather than aid it. (The performance side only comes into view in combination with later series, but it was why I harped so much since then on only comparing against at most one other file in the steps before full inexact rename detection.) I was quite surprised by the diversion, but it made it clear to me that my descriptions and commit messages were far too vague and could be read to imply a completely different algorithm than I intended. So, I tried to be far more careful in subsequent iterations by adding wider context and contrasts. Further, after I wrote various things to try to clarify the misunderstandings, I noticed that Stolee picked out one thing and stated that "This idea of optimizing first for 100% filename similarity is a good perspective on Git's rename detection algorithm." (see https://lore.kernel.org/git/57d30e7d-7727-8d98-e3ef-bcfeebf9edd3@xxxxxxxxx/) So, that particular point seemed to help him understand more, and thus might be useful extra context for others reading along now or in the future. Given all the above, I was trying to address earlier misunderstandings and provide more context. Perhaps I swung the pendulum too far and talked too much about other cases, or perhaps I just worded things poorly again. All I was attempting to do in the commit message was point out the multiple basic rules with filename and content similarity, to lay the groundwork for new rules that do alternative weightings. Anyway, I've added a few more tweaks to try to improve the wording for the next round I'll submit today. Given my track record so far, it would not be surprising if it still needed more tweaks.