Re: weaning distributions off tarballs: extended verification of git tags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 3, 2015 at 6:44 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
>
>> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@xxxxxxxxxx> wrote:
>>> I support this proposal, as someone who no longer releases tarballs
>>> of my software, when I can possibly avoid it. I have worried about
>>> signed tags / commits only being a SHA1 break away from useless.
>>>
>>> As to the implementation, checksumming the collection of raw objects is
>>> certainly superior to tar. Colin had suggested sorting the objects by
>>> checksum, but I don't think that is necessary. Just stream the commit
>>> object, then its tree object, followed by the content of each object
>>> listed in the tree, recursing into subtrees as necessary. That will be a
>>> stable stream for a given commit, or tree.
>>
>> It could be simplified a bit by using ls-tree -r (so you basically
>> have a single big tree). Then hash commit, ls-tree -r output and all
>> blobs pointed by ls-tree in listed order.
>
> What problem are you trying to solve here, though, by deliberately
> deviating what Git internally used to store these objects?  If it is
> OK to ignore the tree boundary, then you probably do not even need
> trees in this secondary hash for validation in the first place.
>
> For example, you can hash a stream:
>
>     <commit object contents> +
>     N * (<pathname> + NUL + <blob object contents>)
>
> as long as the <pathname>s are sorted in a predictable order (like
> in "the index order") in the output.  That would be even simpler (I
> am not saying it is necessarily better, and by inference neither is
> your "simplification").

I did nearly that [1]. But this morning I realized trees carry file
permission. We should keep that in the final checksum as well.

> Now, if the final objective is to replace signature of tarballs,
> does it matter to cover the commit object, or is it sufficient to
> cover the tree contents?
>
> Among the ideas raised so far, I like what Joey suggested, combined
> with "each should have '<type> <length>NUL' header" from Sam Vilain
> the best.  That is, hash the stream:
>
>     "commit <length>" NUL + <commit object contents> +
>     "tree <length>" NUL + <top level tree contents> +
>     ... list the entries in the order you would find by
>     ... some defined traversal order people can agree on.
>
> with whatever the preferred strong hash function of the age.

A bit harder to script, but simpler to provide from cat-file, I think.

[1] http://article.gmane.org/gmane.comp.version-control.git/260211
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]