On Wed, Aug 01, 2012 at 08:10:12AM +0700, Nguyen Thai Ngoc Duy wrote: > > I do not think that is the right direction. Let's imagine that I have a > > commit "A" and I annotate it (via notes or whatever) to say "between > > A^^{tree} and A^{tree}, foo.c became bar.c". That will help me when > > doing "git show" or "git log". But it will not help me when I later try > > to merge "A" (or its descendent). In that case, I will compute the diff > > between "A" and the merge-base (or worse, some descendent of "A" and the > > merge-base), and I will miss this hint entirely. > > > > A much better hint is to annotate pairs of sha1s, to say "do not bother > > doing inexact rename correlation on this pair; I promise that they have > > value N". > > I haven't had time to think it through yet but I throw my thoughts in > any way. I actually went with your approach first. But it's more > difficult to control the renaming. Assume we want to tell git to > rename SHA-1 "A" to SHA-1 "B". What happens if we have two As in the > source tree and two Bs in the target tree? What happens if two As and > one B, or one A and two Bs? What if a user defines A -> B and A -> C, > and we happen to have two As in source tree and B and C in target > tree? Yes, it disregards path totally. But if you had the exact same movement of content from one path to another in one instance, and it is considered a rename, wouldn't it also be a rename in a second instance? > There's also the problem with transferring this information. With > git-notes I think I can transfer it (though not automatically). How do > we transfer sha1 map (that you mentioned in the commit generation mail > in this thread)? That is orthogonal to the issue of what is being stored. I chose my mmap'd disk implementation because it is very fast, which makes it nice for a performance cache. But you could store the same thing in git-notes (indexed by dst sha1, I guess, and then pointing to a blob of (src, score) pairs. If you want to include path-based hints in a commit, I'd say that using some micro-format in the commit message would be the simplest thing. But that has been discussed before; ultimately the problem is that it only covers _one_ diff that we do with that commit (it is probably the most common, of course, but it doesn't cover them all). > > Then it will find that pair no matter which trees or commits > > are being diffed, and it will do so relatively inexpensively[1]. > > But does that happen often in practice? I mean diff-ing two arbitrary > trees and expect rename correction. I disregarded it as "git log" is > my main case, but I'm just a single user.. It happens every time merge-recursive does rename detection, which includes "git merge" but also things like "cherry-pick". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html