On Tue, Mar 3, 2015 at 6:44 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Duy Nguyen <pclouds@xxxxxxxxx> writes: > >> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@xxxxxxxxxx> wrote: >>> I support this proposal, as someone who no longer releases tarballs >>> of my software, when I can possibly avoid it. I have worried about >>> signed tags / commits only being a SHA1 break away from useless. >>> >>> As to the implementation, checksumming the collection of raw objects is >>> certainly superior to tar. Colin had suggested sorting the objects by >>> checksum, but I don't think that is necessary. Just stream the commit >>> object, then its tree object, followed by the content of each object >>> listed in the tree, recursing into subtrees as necessary. That will be a >>> stable stream for a given commit, or tree. >> >> It could be simplified a bit by using ls-tree -r (so you basically >> have a single big tree). Then hash commit, ls-tree -r output and all >> blobs pointed by ls-tree in listed order. > > What problem are you trying to solve here, though, by deliberately > deviating what Git internally used to store these objects? If it is > OK to ignore the tree boundary, then you probably do not even need > trees in this secondary hash for validation in the first place. > > For example, you can hash a stream: > > <commit object contents> + > N * (<pathname> + NUL + <blob object contents>) > > as long as the <pathname>s are sorted in a predictable order (like > in "the index order") in the output. That would be even simpler (I > am not saying it is necessarily better, and by inference neither is > your "simplification"). I did nearly that [1]. But this morning I realized trees carry file permission. We should keep that in the final checksum as well. > Now, if the final objective is to replace signature of tarballs, > does it matter to cover the commit object, or is it sufficient to > cover the tree contents? > > Among the ideas raised so far, I like what Joey suggested, combined > with "each should have '<type> <length>NUL' header" from Sam Vilain > the best. That is, hash the stream: > > "commit <length>" NUL + <commit object contents> + > "tree <length>" NUL + <top level tree contents> + > ... list the entries in the order you would find by > ... some defined traversal order people can agree on. > > with whatever the preferred strong hash function of the age. A bit harder to script, but simpler to provide from cat-file, I think. [1] http://article.gmane.org/gmane.comp.version-control.git/260211 -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html