Re: weaning distributions off tarballs: extended verification of git tags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Tue, Mar 3, 2015 at 1:12 AM, Joey Hess <id@xxxxxxxxxx> wrote:
>> I support this proposal, as someone who no longer releases tarballs
>> of my software, when I can possibly avoid it. I have worried about
>> signed tags / commits only being a SHA1 break away from useless.
>>
>> As to the implementation, checksumming the collection of raw objects is
>> certainly superior to tar. Colin had suggested sorting the objects by
>> checksum, but I don't think that is necessary. Just stream the commit
>> object, then its tree object, followed by the content of each object
>> listed in the tree, recursing into subtrees as necessary. That will be a
>> stable stream for a given commit, or tree.
>
> It could be simplified a bit by using ls-tree -r (so you basically
> have a single big tree). Then hash commit, ls-tree -r output and all
> blobs pointed by ls-tree in listed order.

What problem are you trying to solve here, though, by deliberately
deviating what Git internally used to store these objects?  If it is
OK to ignore the tree boundary, then you probably do not even need
trees in this secondary hash for validation in the first place.

For example, you can hash a stream:

    <commit object contents> +
    N * (<pathname> + NUL + <blob object contents>)

as long as the <pathname>s are sorted in a predictable order (like
in "the index order") in the output.  That would be even simpler (I
am not saying it is necessarily better, and by inference neither is
your "simplification").

I was about to suggest another alternative.

    Pretend as if Git internally used SHA-512 (or whatever hash you
    want to use) instead of SHA-1, compute the object names that
    way.  Recompute the contents of a tree object is by replacing
    the 20-byte SHA-1 field in it with a field with whatever
    necessary length to hold the longer object names of elements in
    the tree.

But then a realization hit me: what new value will be placed in the
"parent " field in the commit object?  You cannot have SHA-512
variant of commit object name without recomputing the whole history.

Now, if the final objective is to replace signature of tarballs,
does it matter to cover the commit object, or is it sufficient to
cover the tree contents?

Among the ideas raised so far, I like what Joey suggested, combined
with "each should have '<type> <length>NUL' header" from Sam Vilain
the best.  That is, hash the stream:

    "commit <length>" NUL + <commit object contents> +
    "tree <length>" NUL + <top level tree contents> +
    ... list the entries in the order you would find by
    ... some defined traversal order people can agree on.

with whatever the preferred strong hash function of the age.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]