On Sat, Feb 13, 2021 at 5:32 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > I do not consider "the same file changed in place" the same as "we > seem to have lost a file in the old tree, ah, we found one that has > the same basename in a different directory" at all, so your argument > still does not make any sense to me, sorry. I'm not set on the commit message wording, you asked why I had used the terms I did and I tried to explain. I also explained how the wording seemed to have helped Stolee understand. If you'd like to suggest an alternative commit message, I'm happy to take it. > 2021年2月13日(土) 17:25 Elijah Newren <newren@xxxxxxxxx>: > > > > On Sat, Feb 13, 2021 at 3:56 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > > > Elijah Newren <newren@xxxxxxxxx> writes: > > > > > > > This is not true. If src/main.c is 99% similar to src/foo.c, and is > > > > 0% similar to the src/main.c in the new commit, we match the old > > > > src/main.c to the new src/main.c despite being far more similar > > > > src/foo.c. Unless break detection is turned on, we do not allow > > > > content similarity to trump (full) filename equality. > > > > > > Absolutely. And we are talking about a new optimization that kicks > > > in only when there is no break or no copy detection going on, no? > > > > Yes, precisely, we are only considering cases without break > > detection...and thus we are considering cases where for the last 15 > > years or more, sufficiently large filename similarity (an exact > > fullname match) trumps any level of content similarity. I think it is > > useful to note that while my optimization is adding more > > considerations that can overrule maximal content similarity, it is not > > the first such code choice to do that. > > > > But let me back up a bit... > > > > When I submitted the series, you and Stolee went into a long > > discussion about an optimization that I didn't submit, one that feels > > looser on "matching" than anything I submitted, and which I think > > might counter-intuitively reduce performance rather than aid it. (The > > performance side only comes into view in combination with later > > series, but it was why I harped so much since then on only comparing > > against at most one other file in the steps before full inexact rename > > detection.) I was quite surprised by the diversion, but it made it > > clear to me that my descriptions and commit messages were far too > > vague and could be read to imply a completely different algorithm than > > I intended. So, I tried to be far more careful in subsequent > > iterations by adding wider context and contrasts. > > > > Further, after I wrote various things to try to clarify the > > misunderstandings, I noticed that Stolee picked out one thing and > > stated that "This idea of optimizing first for 100% filename > > similarity is a good perspective on Git's rename detection algorithm." > > (see https://lore.kernel.org/git/57d30e7d-7727-8d98-e3ef-bcfeebf9edd3@xxxxxxxxx/) > > So, that particular point seemed to help him understand more, and > > thus might be useful extra context for others reading along now or in > > the future. > > > > Given all the above, I was trying to address earlier misunderstandings > > and provide more context. Perhaps I swung the pendulum too far and > > talked too much about other cases, or perhaps I just worded things > > poorly again. All I was attempting to do in the commit message was > > point out the multiple basic rules with filename and content > > similarity, to lay the groundwork for new rules that do alternative > > weightings. > > > > Anyway, I've added a few more tweaks to try to improve the wording for > > the next round I'll submit today. Given my track record so far, it > > would not be surprising if it still needed more tweaks.