Sam Vilain <sam@xxxxxxxxxx> writes: >> As to the implementation, checksumming the collection of raw objects is >> certainly superior to tar. Colin had suggested sorting the objects by >> checksum, but I don't think that is necessary. Just stream the commit >> object, then its tree object, followed by the content of each object >> listed in the tree, recursing into subtrees as necessary. That will be a >> stable stream for a given commit, or tree. > > I would really just do it exactly the same way that git does: checksum > the objects including their headers with the new hashes. I tend to agree that it is a good idea. I also suspect that would make the implementation simpler by allowing it to share more code, but I didn't look into it too deeply. > I have a > hazy recollection of what it would take to replace SHA-1 in git with > something else; it should be possible (though tricky) to do it lazily, > where a tree entry has bits (eg, some of the currently unused file > mode bits) to denotes which hash algorithm is in use for the entry. > However I don't think that got past idea stage... I think one reason why it didn't was because it would not work well. That "bit that tells this is a new object or old" would mean that a single tree can have many different object names, depending on which of its component entries are using that bit and which aren't. There goes the "we know two trees with the same object name are identical without recursing into them" optimization out the window. Also it would make it impossible to do what you suggest to Joey to do, i.e. "exactly the same way that git does", once you start saying that a tree object can be encoded in more than one different ways, wouldn't it? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html