Duy Nguyen <pclouds@xxxxxxxxx> writes: > On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@xxxxxxxxxx> wrote: >> I support this proposal, as someone who no longer releases tarballs >> of my software, when I can possibly avoid it. I have worried about >> signed tags / commits only being a SHA1 break away from useless. >> >> As to the implementation, checksumming the collection of raw objects is >> certainly superior to tar. Colin had suggested sorting the objects by >> checksum, but I don't think that is necessary. Just stream the commit >> object, then its tree object, followed by the content of each object >> listed in the tree, recursing into subtrees as necessary. That will be a >> stable stream for a given commit, or tree. > > It could be simplified a bit by using ls-tree -r (so you basically > have a single big tree). Then hash commit, ls-tree -r output and all > blobs pointed by ls-tree in listed order. What problem are you trying to solve here, though, by deliberately deviating what Git internally used to store these objects? If it is OK to ignore the tree boundary, then you probably do not even need trees in this secondary hash for validation in the first place. For example, you can hash a stream: <commit object contents> + N * (<pathname> + NUL + <blob object contents>) as long as the <pathname>s are sorted in a predictable order (like in "the index order") in the output. That would be even simpler (I am not saying it is necessarily better, and by inference neither is your "simplification"). I was about to suggest another alternative. Pretend as if Git internally used SHA-512 (or whatever hash you want to use) instead of SHA-1, compute the object names that way. Recompute the contents of a tree object is by replacing the 20-byte SHA-1 field in it with a field with whatever necessary length to hold the longer object names of elements in the tree. But then a realization hit me: what new value will be placed in the "parent " field in the commit object? You cannot have SHA-512 variant of commit object name without recomputing the whole history. Now, if the final objective is to replace signature of tarballs, does it matter to cover the commit object, or is it sufficient to cover the tree contents? Among the ideas raised so far, I like what Joey suggested, combined with "each should have '<type> <length>NUL' header" from Sam Vilain the best. That is, hash the stream: "commit <length>" NUL + <commit object contents> + "tree <length>" NUL + <top level tree contents> + ... list the entries in the order you would find by ... some defined traversal order people can agree on. with whatever the preferred strong hash function of the age. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html