Derrick Stolee <stolee@xxxxxxxxx> writes: > Before I get too deep into reviewing these patches, I do want > to make it clear that the speed-up is coming at the cost of > a behavior change. We are restricting the "best match" search > to be first among files with common base name (although maybe > I would use 'suffix'?). If we search for a rename among all > additions and deletions ending the ".txt" we might find a > similarity match that is 60% and declare that a rename, even > if there is a ".txt" -> ".md" pair that has a 70% match. Yes, my initial reaction to the idea was that "yuck, our rename detection lost its purity". diffcore-rename strived to base its decision purely on content similarity, primarily because it is one of the oldest part of Git where the guiding principle has always been that the content is the king. I think my aversion to the "all of my neighbors are relocating, so should I move to the same place" (aka "directory rename") comes from a similar place, but in a sense this was worse. At least, until I got over the initial bump. I do not think the suffix match is necessarily a bad idea, but it adds more "magically doing a wrong thing" failure modes (e.g. the ".txt" to ".md" example would probably have more variants that impact the real life projects; ".C" vs ".cc" vs ".cxx" vs ".cpp" immediately comes to mind), and a tool that silently does a wrong thing because it uses more magic would be a tool that is hard to explain why it did the wrong thing when it does. > This could be documented in a test case, to demonstrate that > we are making this choice explicitly. Yes. I wonder if we can solve it by requiring a lot better than minimum match when trying the "suffix match" first, or something? Provided if we agree that it is a good idea to insert this between "exact contents match" and "full matrix", I have one question to Elijah on what the code does. To me, it seems that the "full matrix" part still uses the remaining src and dst candidates fully. But if "A.txt" and "B.txt" are still surviving in the src/dst at that stage, shouldn't we be saying that "no way these can be similar enough---we've checked in the middle stage where only the ones with the same suffix are considered and this pair didn't turn into a rename"? Thanks.