On Tue, 30 Oct 2007, Jeff King wrote: > > On Mon, Oct 29, 2007 at 10:06:11PM -0700, Linus Torvalds wrote: > > > Have you compared the results? IOW, does it find the *same* renames? > > From my limited testing, it generally finds the same pairs. However, > there are a number of renames that it _doesn't_ find, because they are > composed of "uninteresting" lines, dropping them below the minimum > score. Try (in git.git): > > git-show --raw -M -l0 :/'Big tool rename' > > with the old and new code. Pairs like Documentation/git-add-script.txt > -> Documentation/git-add.txt are not found, because the file is composed > almost entirely of boilerplate. Ok, that does imply to me that we cannot just drop boilerplate text, because the fact is, lots of files contain boilerplate, but people still think they are "similar". We do actually depend on the similarity analysis being "good" - because it matters a lot for things like merging. The old code was actually very careful indeed, and while it didn't care about things like the exact *ordering* of lines (ie moving functions around in the same file resulted in the *exact* same fingerprint for the file!) it cared about everything else. > Moving the size normalization into the similarity engine should probably > fix that, and will let us compare old and new results more accurately. > I'll try to work on that. Hmm. I hope that is sufficient. But I suspect it may well not be. Especially since you ignore boiler-plate lines for *some* files but not others (ie it depends on which file you happen to find it in first). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html