>>>>> "Linus" == Linus Torvalds <torvalds@xxxxxxxx> writes: Linus> On Thu, 26 Oct 2006, Vincent Ladeuil wrote: >> >> Ok, so git make a distinction between the commit (code created by >> someone) and the tree (code only). >> >> Commits are defined by their parents. Linus> Commits are defined by a _combination_ of: Linus> - the tree they commit (which is recursive, so the Linus> commit name indirectly includes information EVERY Linus> SINGLE BIT in the whole tree, in every single file) And here you keep that separate from any SCM related info, right ? Linus> - the parent(s) if any (which is also recursive, so Linus> the commit name indirectly includes information about Linus> EVERY SINGLE BIT in not just the current tree, but Linus> every tree in the history, and every commit that is Linus> reachable from it) Linus> - the author, committer, and dates of each (and Linus> committer is actually very often different from Linus> author) Linus> - the actual commit message Linus> So a commit really names - uniquely and authoratively Linus> - not just the commit itself, but everything ever Linus> associated with it. Thanks for the clarification. But no need to shout about EVERY SINGLE BIT, the pointer to BDDs was already talking a bit about bits :) But I agree, this is the important point that may be missed. >> Trees are defined by their content only ? Linus> Where "contents" does include names and Linus> permissions/types (eg execute bit and symlink etc). Which can also be expressed as: "Everything the user can manipulate outside the SCM context", right ? >> If that's the case, how do you proceed ? Linus> If you compare the commit name, and they are equal, Linus> you automatically know Linus> - the trees are 100% identical Linus> - the histories are 100% identical And that's the only info you can get, no ordering here. (Just pointing the obvious, as soon as you try to put more info into the signature, the equality will vanish). But for various optimizations this equality property is the only needed one. Do we agree ? Linus> If you only care about the actual tree, you compare Linus> the tree name for equality, ie you can do Linus> git-rev-parse commit1^{tree} commit2^{tree} Linus> and compare the two: if and only if they are equal are Linus> the actual contents 100% equal. Actually, that's backwards: "their actual contents are equal" implies "their signatures are equal". But, two totally different trees can have the same signature. My god ! What an horror ! Not. I even wonder if I will live so long as to see it occurs... So we *can* pretend that: "theirs signatures are equal" is equivalent to "their contents are equal" And that's all we care :) But I digressed, the question was about a detail on your tree definition, once the signature is defined to be unique (as in canonical), the property of comparing the signatures as if they were the objects themselves follows. Thanks for the confirmation. >> Calculate a sha1 representing the content (or the content >> of the diff from parent) of all the files and dirs in the >> tree ? Or from the sha1s of the files and dirs themselves >> recursively based on sha1s of the files and dirs they >> contain ? Linus> The latter. Thanks for providing the clarification. So of course, finding the differences between the trees is quick, you can prune anywhere the signatures equality is verified. >> I ask because the later seems to provide some nice effects >> similar to what makes BDD >> (http://en.wikipedia.org/wiki/Binary_decision_diagram) so >> efficient: you can compare graphs of any complexity or size in >> O(1) by just comparing their signatures. Linus> This is exactly what git does. You can compare entire Linus> trees (and subdirectories are just other trees) by Linus> just comparing 20 bytes of information. I understand that, years ago even. I have a bit of practice with BDDs and I am accustomed to that so lovely property. But without that practice, I think most people will just wonder... <snip/> Linus> And the reason it's fast is that we can compare 20,000 Linus> files (names, contents, permissions) by just comparing Linus> a _single_ 20-byte SHA1. Yeah, let's go further ! We can compare gazillions of files and their history since epoch by comparing _two_ signatures ! :-) Linus> In git, revision names (and _everything_ has a Linus> revision name: commits, trees, blobs, tags) really Linus> have meaning. They're not just random noise. I know that effect, but I understand people complaining that they *look* like noise. I'm still searching a parallel in nature, but the best I could find is DNA, ever look at a DNA ? Looks like noise no ? No ordering either between parents and children... But there is a way to identify a parent from the DNA of a children... Vincent - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html