In article <32541b131002031147r367ee08fxc64c4c54165953a3@xxxxxxxxxxxxxx>, Avery Pennarun <apenwarr@xxxxxxxxx> wrote: > On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@xxxxxxxxxxx> wrote: > > In article > > Ah. That explains everything. Thanks. (I thought git mv was > > equivalent to git rm followed by git add. But it's not.) > > I suppose in this case it's not. The only difference is when your > work tree differs from your index, though, and it's to be expected > that 'git rm', in removing things from the index, would lose your > ability to track those differences. > > > So... how *does* git decide when two blobs are different blobs and when > > they are the same blob with mods? I asked this question before and was > > pointed to the diffcore docs, but that didn't really clear things up. > > That just describes all the different ways git can do diffs, not the > > actual heuristics that git uses to track content. > > If you really want to know the details, looking at the code really is > probably the best solution; it's not even that long. > > The short version is that git chooses a set of candidate blobs, then > diffs them and figures out a percentage similarity between each pair. > (A simple way to think of the similarity index is "how long is the > diff compared to the file itself?" If the diff is of length zero, the > similarity is 100%, and so on.) If the similarity is greater than a > certain threshold, then it's considered to be the same file. > > Choosing the set of candidates is actually the more interesting > problem, since detecting moves using the above algorithm is O(n^2) > with the number of candidates. That's why 'git diff' and 'git log' > don't do it at all by default. > > If you provide -M, the set of candidates is the set of files that were > removed/modified and the set of files that were added. (Added files > are compared against removed/modified files, iirc.) Normally that's a > very short list. With -C, you need to compare all > added/removed/modified files with all others, which is slightly more > work. With --find-copies-harder, it becomes potentially a *lot* of > work. Thanks! That clarifies a lot. rg -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html