Re: git-mv redux: there must be something else going on

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In article 
<32541b131002031147r367ee08fxc64c4c54165953a3@xxxxxxxxxxxxxx>,
 Avery Pennarun <apenwarr@xxxxxxxxx> wrote:

> On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@xxxxxxxxxxx> wrote:
> > In article
> > Ah.  That explains everything.  Thanks.  (I thought git mv was
> > equivalent to git rm followed by git add.  But it's not.)
> 
> I suppose in this case it's not.  The only difference is when your
> work tree differs from your index, though, and it's to be expected
> that 'git rm', in removing things from the index, would lose your
> ability to track those differences.
> 
> > So... how *does* git decide when two blobs are different blobs and when
> > they are the same blob with mods?  I asked this question before and was
> > pointed to the diffcore docs, but that didn't really clear things up.
> > That just describes all the different ways git can do diffs, not the
> > actual heuristics that git uses to track content.
> 
> If you really want to know the details, looking at the code really is
> probably the best solution; it's not even that long.
> 
> The short version is that git chooses a set of candidate blobs, then
> diffs them and figures out a percentage similarity between each pair.
> (A simple way to think of the similarity index is "how long is the
> diff compared to the file itself?"  If the diff is of length zero, the
> similarity is 100%, and so on.) If the similarity is greater than a
> certain threshold, then it's considered to be the same file.
> 
> Choosing the set of candidates is actually the more interesting
> problem, since detecting moves using the above algorithm is O(n^2)
> with the number of candidates.  That's why 'git diff' and 'git log'
> don't do it at all by default.
> 
> If you provide -M, the set of candidates is the set of files that were
> removed/modified and the set of files that were added.  (Added files
> are compared against removed/modified files, iirc.)  Normally that's a
> very short list.  With -C, you need to compare all
> added/removed/modified files with all others, which is slightly more
> work.  With --find-copies-harder, it becomes potentially a *lot* of
> work.

Thanks!  That clarifies a lot.

rg

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]