In article <ron1-34F9C6.12273203022010@xxxxxxxxxxxxxx>, Ron Garret <ron1@xxxxxxxxxxx> wrote: > In article <alpine.LFD.2.00.1002031436490.1681@xxxxxxxxxxx>, > Nicolas Pitre <nico@xxxxxxxxxxx> wrote: > > > On Wed, 3 Feb 2010, Ron Garret wrote: > > > > > So... how *does* git decide when two blobs are different blobs and when > > > they are the same blob with mods? I asked this question before and was > > > pointed to the diffcore docs, but that didn't really clear things up. > > > That just describes all the different ways git can do diffs, not the > > > actual heuristics that git uses to track content. > > > > Yes, those same heuristics are used to make the decision. > > > > |The second transformation in the chain is diffcore-break, and is > > |controlled by the -B option to the 'git diff-{asterisk}' commands. > > |This is used to detect a filepair that represents "complete rewrite" > > |and break such filepair into two filepairs that represent delete and > > |create. > > |[...] > > > > |This transformation is used to detect renames and copies, and is > > |controlled by the -M option (to detect renames) and the -C option > > |(to detect copies as well) to the 'git diff-{asterisk}' commands. > > |[...] > > > > Note that you may use the -B, -C, -M and --find-copies-harder arguments > > with log as well as diff commands even if there is no actual diff > > output. So the explanation is really in that document even if simple > > rename detection is concerned only by a fraction of what is said there. > > > > And Git can detect copied files too. > > > > Those semantics are not stored in the repository so they can be improved > > or even changed after the facts. > > OK, on closer reading I see that the information is there, but it's well > hidden :-) (For example, the -M option takes an optional numerical > argument so you can tweak how much similarity is needed to be considered > a move. But the docs for git log don't mention this. It's buried deep > in the git diffcore docs. But yes, it's there.) > > So I think I'm beginning to understand how this works, but that leads me > to another question: it seems to me that there are potential screw cases > for this purely content-based system of tracking files. For example, > suppose I have a directory full of sample config files, all of which are > similar to each other. Will that cause diffcore to get confused? > > Feel free to treat that as a rhetorical question because obviously I can > (and probably should) get the answer by trying it. Actually, I think the answer is in Avery's post in another branch of this thread. rg -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html