In article <alpine.LFD.2.00.1002031436490.1681@xxxxxxxxxxx>, Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > On Wed, 3 Feb 2010, Ron Garret wrote: > > > So... how *does* git decide when two blobs are different blobs and when > > they are the same blob with mods? I asked this question before and was > > pointed to the diffcore docs, but that didn't really clear things up. > > That just describes all the different ways git can do diffs, not the > > actual heuristics that git uses to track content. > > Yes, those same heuristics are used to make the decision. > > |The second transformation in the chain is diffcore-break, and is > |controlled by the -B option to the 'git diff-{asterisk}' commands. > |This is used to detect a filepair that represents "complete rewrite" > |and break such filepair into two filepairs that represent delete and > |create. > |[...] > > |This transformation is used to detect renames and copies, and is > |controlled by the -M option (to detect renames) and the -C option > |(to detect copies as well) to the 'git diff-{asterisk}' commands. > |[...] > > Note that you may use the -B, -C, -M and --find-copies-harder arguments > with log as well as diff commands even if there is no actual diff > output. So the explanation is really in that document even if simple > rename detection is concerned only by a fraction of what is said there. > > And Git can detect copied files too. > > Those semantics are not stored in the repository so they can be improved > or even changed after the facts. OK, on closer reading I see that the information is there, but it's well hidden :-) (For example, the -M option takes an optional numerical argument so you can tweak how much similarity is needed to be considered a move. But the docs for git log don't mention this. It's buried deep in the git diffcore docs. But yes, it's there.) So I think I'm beginning to understand how this works, but that leads me to another question: it seems to me that there are potential screw cases for this purely content-based system of tracking files. For example, suppose I have a directory full of sample config files, all of which are similar to each other. Will that cause diffcore to get confused? Feel free to treat that as a rhetorical question because obviously I can (and probably should) get the answer by trying it. Thanks! rg -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html