On Tue, Oct 30, 2007 at 08:38:24AM -0700, Linus Torvalds wrote: > > with the old and new code. Pairs like Documentation/git-add-script.txt > > -> Documentation/git-add.txt are not found, because the file is composed > > almost entirely of boilerplate. > > Ok, that does imply to me that we cannot just drop boilerplate text, > because the fact is, lots of files contain boilerplate, but people still > think they are "similar". Well, the problem is that instead of just "dropping" boilerplate text, we fail to count it as a similarity, but it still counts towards the file size. It may be that just dropping it totally is the right thing (in which case those renames _will_ turn up, because they will be filled with identical non-boilerplate goodness). > Hmm. I hope that is sufficient. But I suspect it may well not be. > Especially since you ignore boiler-plate lines for *some* files but not > others (ie it depends on which file you happen to find it in first). Yes, that part bothers me a little, so I think a "too common, ignore" overflow flag would at least be better. But I think the best thing to do now is for me to shut up and see what the results look like with the tweaks I have mentioned. -Peff - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html