On Sun, 12 Mar 2006, Linus Torvalds wrote: > > The "score" calculation for diffcore-rename was totally broken. > > It scaled "score" as > > score = src_copied * MAX_SCORE / dst->size; > > which means that you got a 100% similarity score even if src and dest were > different, if just every byte of dst was copied from src, even if source > was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ > the remaining 15%). > > That's clearly bogus. We should do the score calculation relative not to > the destination size, but to the max size of the two. > > This seems to fix it. Btw, interestingly, this seems to actually improve on the rename detection from your previous one, even though at the face of it, it should just have made the scores go down. I'm not quite sure why, but perhaps it gave a bogus high score to some rename that wasn't very good, allowing the _real_ rename to make itself seen. Or maybe I did some mistake in testing it. Linus PS. You can still get a "similarity score" of 100 with the fixed scaling even if the source and the destination were different. That happens if every byte was marked as "copied" by the similarity estimator. Which can happen if you just move things around in the file - the end result is different, but all the bytes are copied from the source. At least with the fixed heuristic, that "perfect similarity" score can be _somehow_ be explained. The files are very similar in that they have the same content, just in a different order ;) - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html