On Mon, Oct 29, 2007 at 10:06:11PM -0700, Linus Torvalds wrote: > Have you compared the results? IOW, does it find the *same* renames? >From my limited testing, it generally finds the same pairs. However, there are a number of renames that it _doesn't_ find, because they are composed of "uninteresting" lines, dropping them below the minimum score. Try (in git.git): git-show --raw -M -l0 :/'Big tool rename' with the old and new code. Pairs like Documentation/git-add-script.txt -> Documentation/git-add.txt are not found, because the file is composed almost entirely of boilerplate. Moving the size normalization into the similarity engine should probably fix that, and will let us compare old and new results more accurately. I'll try to work on that. > I'm a bit worried about the fact that you just pick a single (arbitrary) > src/dst per fingerprint. Yes, it should be limited, but that seems to be a > bit too *extremely* limited. But if it gives the same results in practice, > maybe nobody cares? Yes, I have not convinced myself yet that it's the right approach (but it seemed like a good place to try first, for simplicity and speed). As I noted, this approach seems to be a bit memory hungry on large, so I am a bit concerned about increasing the size of the fingerprint_entry structure. However, Andy's sampling approach might help fix that. The current code also doesn't bother marking overflow, so common lines get attributes to some random file (actually, worse than random: if a bunch of files have the same common lines, _all_ of the lines will go to the last file, which means we subtly favor renames from the end of the input list). So probably it should be tested as-is, with an "overflow, this line is too common to be interesting" bit, and with a small-ish limit (I had at one point tried 5, but the implementation was naive and too memory-hungry). -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html