Re: Basename matching during rename/copy detection

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 20 Jun 2007 20:42:38 -0700 (PDT)

On Wed, 20 Jun 2007, Shawn O. Pearce wrote:
> 
> I'm wondering if we shouldn't play the game of trying to match
> delete/add pairs up by not only similarity, but also by path
> basename.

I think we should just consider the basename as an "added 
similarity bonus".

IOW, we currently sort purely by data similarity, but how about just 
adding a small increment for "same base name".

We could make it actually use the similarity of the filename itself as the 
basis for the increment, which would be even better, but the trivial thing 
is to do something like

	--- a/diffcore-rename.c
	+++ b/diffcore-rename.c
	@@ -186,8 +186,11 @@ static int estimate_similarity(struct diff_filespec *src,
	 	 */
	 	if (!dst->size)
	 		score = 0; /* should not happen */
	-	else
	+	else {
	 		score = (int)(src_copied * MAX_SCORE / max_size);
	+		if (basename_same(src, dst))
	+			score++;
	+	}
	 	return score;
	 }

and just implement that "basename_same()" function.

Or something.

I do agree that the filename logically can and probably _should_ count 
towards the "similarity". The filename _is_ part of the data in the global 
notion of "content", after all. It's the "index" to the data.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html