I do not consider "the same file changed in place" the same as "we seem to have lost a file in the old tree, ah, we found one that has the same basename in a different directory" at all, so your argument still does not make any sense to me, sorry. 2021年2月13日(土) 17:25 Elijah Newren <newren@xxxxxxxxx>: > > On Sat, Feb 13, 2021 at 3:56 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > > > Elijah Newren <newren@xxxxxxxxx> writes: > > > > > This is not true. If src/main.c is 99% similar to src/foo.c, and is > > > 0% similar to the src/main.c in the new commit, we match the old > > > src/main.c to the new src/main.c despite being far more similar > > > src/foo.c. Unless break detection is turned on, we do not allow > > > content similarity to trump (full) filename equality. > > > > Absolutely. And we are talking about a new optimization that kicks > > in only when there is no break or no copy detection going on, no? > > Yes, precisely, we are only considering cases without break > detection...and thus we are considering cases where for the last 15 > years or more, sufficiently large filename similarity (an exact > fullname match) trumps any level of content similarity. I think it is > useful to note that while my optimization is adding more > considerations that can overrule maximal content similarity, it is not > the first such code choice to do that. > > But let me back up a bit... > > When I submitted the series, you and Stolee went into a long > discussion about an optimization that I didn't submit, one that feels > looser on "matching" than anything I submitted, and which I think > might counter-intuitively reduce performance rather than aid it. (The > performance side only comes into view in combination with later > series, but it was why I harped so much since then on only comparing > against at most one other file in the steps before full inexact rename > detection.) I was quite surprised by the diversion, but it made it > clear to me that my descriptions and commit messages were far too > vague and could be read to imply a completely different algorithm than > I intended. So, I tried to be far more careful in subsequent > iterations by adding wider context and contrasts. > > Further, after I wrote various things to try to clarify the > misunderstandings, I noticed that Stolee picked out one thing and > stated that "This idea of optimizing first for 100% filename > similarity is a good perspective on Git's rename detection algorithm." > (see https://lore.kernel.org/git/57d30e7d-7727-8d98-e3ef-bcfeebf9edd3@xxxxxxxxx/) > So, that particular point seemed to help him understand more, and > thus might be useful extra context for others reading along now or in > the future. > > Given all the above, I was trying to address earlier misunderstandings > and provide more context. Perhaps I swung the pendulum too far and > talked too much about other cases, or perhaps I just worded things > poorly again. All I was attempting to do in the commit message was > point out the multiple basic rules with filename and content > similarity, to lay the groundwork for new rules that do alternative > weightings. > > Anyway, I've added a few more tweaks to try to improve the wording for > the next round I'll submit today. Given my track record so far, it > would not be surprising if it still needed more tweaks.