Re: Fix up diffcore-rename scoring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Sun, 12 Mar 2006, Junio C Hamano wrote:
>
> Linus Torvalds <torvalds@xxxxxxxx> writes:
> 
> > The "score" calculation for diffcore-rename was totally broken.
> >
> > It scaled "score" as
> >
> > 	score = src_copied * MAX_SCORE / dst->size;
> >
> > which means that you got a 100% similarity score even if src and dest were 
> > different, if just every byte of dst was copied from src, even if source 
> > was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ 
> > the remaining 15%).
> 
> Your reading of the code is correct, but that is deliberate.
> 
> >  	/* How similar are they?
> >  	 * what percentage of material in dst are from source?
> >  	 */
> 
> I wanted to say in such a case that dst was _really_ derived
> from the source.  I think using max may make more sense, but I
> need to convince myself by looking at filepairs that this change
> stops detecting as renames, and this change starts detecting as
> renames.

Just compare the result. Just eye-balling the difference between the 
rename data from 2.6.12 to 2.6.14, the fixed score actually gets better 
rename detection. It actually finds 133 renames (as opposed to 132 for the 
broken one), and the renames it finds are more sensible.

For example, the fixed version finds

	drivers/i2c/chips/lm75.h -> drivers/hwmon/lm75.h

which actually matches the other i2c/chips/ renames, while the broken one 
does

	drivers/i2c/chips/lm75.h -> drivers/media/video/rds.h

which just doesn't make any sense at all.

Now, that said, they _both_ find some pretty funky renames. I think there 
is probably some serious room for improvement, regardless (or at least 
changing the default similarity cut-off to something better ;)

		Linus
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]