On Sun, 12 Mar 2006, Junio C Hamano wrote: > > Linus Torvalds <torvalds@xxxxxxxx> writes: > > > The "score" calculation for diffcore-rename was totally broken. > > > > It scaled "score" as > > > > score = src_copied * MAX_SCORE / dst->size; > > > > which means that you got a 100% similarity score even if src and dest were > > different, if just every byte of dst was copied from src, even if source > > was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ > > the remaining 15%). > > Your reading of the code is correct, but that is deliberate. > > > /* How similar are they? > > * what percentage of material in dst are from source? > > */ > > I wanted to say in such a case that dst was _really_ derived > from the source. I think using max may make more sense, but I > need to convince myself by looking at filepairs that this change > stops detecting as renames, and this change starts detecting as > renames. Just compare the result. Just eye-balling the difference between the rename data from 2.6.12 to 2.6.14, the fixed score actually gets better rename detection. It actually finds 133 renames (as opposed to 132 for the broken one), and the renames it finds are more sensible. For example, the fixed version finds drivers/i2c/chips/lm75.h -> drivers/hwmon/lm75.h which actually matches the other i2c/chips/ renames, while the broken one does drivers/i2c/chips/lm75.h -> drivers/media/video/rds.h which just doesn't make any sense at all. Now, that said, they _both_ find some pretty funky renames. I think there is probably some serious room for improvement, regardless (or at least changing the default similarity cut-off to something better ;) Linus - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html