Re: Commit SHA1 == SHA1 checksum?

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 07 Feb 2022 14:49:16 -0800

Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> writes:

> On Mon, Feb 07, 2022 at 12:57:55PM -0800, Junio C Hamano wrote:
>> You are solving a different problem: "I have this tar archive; what
>> git tree object would I get if I extract this archive to an empty
>> directory and said 'git add . && git write-tree'?".
>> 
>> I agree that one is computable.
>
> So, I was brainstorming about this today, and I'm curious if you think this
> would be a useful feature to have, maybe even natively?
>
> E.g. here's a scenario:
>
> "git archive -S <signed-object>" creates an additional file that is added to
> the generated tar/zip archive -- for example, a ${prefix}.GIT_ARCHIVE_SIG. That
> file contains the raw contents of the signed tag and/or the signed commit.
>
> "git verify-archive" would look for a toplevel .GIT_ARCHIVE_SIG file. If it's
> present, it would verify the signature on these "detached" signed objects to
> get a trusted tree hash. Then it would compute the tree hash of the tar
> archive (minus the .GIT_ARCHIVE_SIG file) to see if it matches.
>
> In my mind, that would provide the following benefits over the current
> practice of detached .sig files:
>
> 1. environments like github/git.kernel.org would be able to create verifiable
>    snapshot archives using an existing set of signed objects
> 2. packagers would be able to perform cryptographic verification without
>    needing to track any extra sources like corresponding .sig files; they
>    would just need to add a build-time dependency on git (plus whatever it
>    calls for cryptographic verification, such as gnupg or openssh)
> 3. this would automatically support all git-native signature mechanisms like
>    openssh and whatever else gets added in the future
>
> Does this idea have any merit, or is it too fragile/crazy to bother?

I may choose details differently at implementation level (instead of
an extra file, I'd see if we can add it as pax_extended_header, for
example), but I think that is workable and might be even useful,
provided if I am not misunderstanding your idea, so let me try
rephrasing to see how it would work.

Given a signed commit or a signed tag that points at a commit, your
enhanced "git archive" would create a .tar file with the contents of
the tree object, and adds copies signed objects that tells what tree
object the archive ought to have.  E.g. if you start from a signed
tag, "git cat-file tag $tag" output would allow you to learn the
object name of the tagged object, and to verify the PGP signature
embedded in the tag, but it is likely that the tagged object is a
commit, not a tree, so you'd also need to include "git cat-file
commit $tag^{commit}".  So you'd store the raw contents of the tag
(so that we have a hash-protected record of commit object name), and
the commit (so that we have a hash-protected record of tree object
name).

You as the recipient will find these in the tarball:

 - the files that are supposed to be the contents of tree X.

 - the raw contents of the commit C that is supposed to record the
   tree X.

 - the raw contents of the tag T that is supposed to point at the
   commit C.

Starting from the contents of tag T, which is PGP signed, you know
that the signer wanted to call commit C with the name of the tag T.
Then the raw contents that alledgedly are from commit C, you can
"git hash-object -t commit" it to verify that it indeed hashes down
to C (hence, it what the signer wanted to give you), and find the
name of the tree object X the commit records.  And when you added
all the blobs contained in the tarball (and nothing else) to the
index and ran write-tree on the resulting index, you would know what
tree object the tarball contained, and if it hashes down to X, you
know that the cryptographic hash chain starting from PGP signature
on T attests that that tarball matches what the signer wanted you
to have.