On Fri, Feb 18, 2011 at 4:26 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > So please consider the attached patch just a "look, guys, this is > wrong, and here's the ugliest hack you've ever seen to fix it". Btw, the more I think about it, the more I suspect that the "estimate_similarity()" part of the patch is correct, or at least better than what we used to have. If we have a file that expanded from 100 lines to 200 lines, and all of the old contents are there, then I think that logically people would expect it to be a "50% similarity". But the thing is, with the old code, we would look at the old smaller size (100 lines), and take 50% of that. And then when the delta (also 100 lines) is bigger than that 50%, then we'd totally dismiss that thing from similarity analysis, because it obviously isn't similar enough. So using the bigger size as the basis (and taking 50% of _that_ and comparing it to the delta) is probably the sane thing to do. The rest of the patch I still think is total crap. The _intention_ is good, but the patch was written to be small rather than the right way of doing things. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html