Re: git-mv redux: there must be something else going on

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In article <alpine.LFD.2.00.1002031436490.1681@xxxxxxxxxxx>,
 Nicolas Pitre <nico@xxxxxxxxxxx> wrote:

> On Wed, 3 Feb 2010, Ron Garret wrote:
> 
> > So... how *does* git decide when two blobs are different blobs and when 
> > they are the same blob with mods?  I asked this question before and was 
> > pointed to the diffcore docs, but that didn't really clear things up.  
> > That just describes all the different ways git can do diffs, not the 
> > actual heuristics that git uses to track content.
> 
> Yes, those same heuristics are used to make the decision.
> 
> |The second transformation in the chain is diffcore-break, and is
> |controlled by the -B option to the 'git diff-{asterisk}' commands.  
> |This is used to detect a filepair that represents "complete rewrite" 
> |and break such filepair into two filepairs that represent delete and
> |create.
> |[...]
> 
> |This transformation is used to detect renames and copies, and is
> |controlled by the -M option (to detect renames) and the -C option
> |(to detect copies as well) to the 'git diff-{asterisk}' commands.  
> |[...]
> 
> Note that you may use the -B, -C, -M and --find-copies-harder arguments 
> with log as well as diff commands even if there is no actual diff 
> output.  So the explanation is really in that document even if simple 
> rename detection is concerned only by a fraction of what is said there.
> 
> And Git can detect copied files too.
> 
> Those semantics are not stored in the repository so they can be improved 
> or even changed after the facts.

OK, on closer reading I see that the information is there, but it's well 
hidden :-)  (For example, the -M option takes an optional numerical 
argument so you can tweak how much similarity is needed to be considered 
a move.  But the docs for git log don't mention this.  It's buried deep 
in the git diffcore docs.  But yes, it's there.)

So I think I'm beginning to understand how this works, but that leads me 
to another question: it seems to me that there are potential screw cases 
for this purely content-based system of tracking files.  For example, 
suppose I have a directory full of sample config files, all of which are 
similar to each other.  Will that cause diffcore to get confused?

Feel free to treat that as a rhetorical question because obviously I can 
(and probably should) get the answer by trying it.

Thanks!
rg

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]