Re: Fix up diffcore-rename scoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 12 Mar 2006, Linus Torvalds wrote:
> 
> The "score" calculation for diffcore-rename was totally broken.
> 
> It scaled "score" as
> 
> 	score = src_copied * MAX_SCORE / dst->size;
> 
> which means that you got a 100% similarity score even if src and dest were 
> different, if just every byte of dst was copied from src, even if source 
> was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ 
> the remaining 15%).
> 
> That's clearly bogus. We should do the score calculation relative not to 
> the destination size, but to the max size of the two.
> 
> This seems to fix it.

Btw, interestingly, this seems to actually improve on the rename 
detection from your previous one, even though at the face of it, it 
should just have made the scores go down.

I'm not quite sure why, but perhaps it gave a bogus high score to some 
rename that wasn't very good, allowing the _real_ rename to make itself 
seen.

Or maybe I did some mistake in testing it.

		Linus

PS. You can still get a "similarity score" of 100 with the fixed scaling 
even if the source and the destination were different. That happens if 
every byte was marked as "copied" by the similarity estimator. Which can 
happen if you just move things around in the file - the end result is 
different, but all the bytes are copied from the source.

At least with the fixed heuristic, that "perfect similarity" score can be 
_somehow_ be explained. The files are very similar in that they have the 
same content, just in a different order ;)
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]