Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> writes: > On Mon, Feb 07, 2022 at 12:57:55PM -0800, Junio C Hamano wrote: >> You are solving a different problem: "I have this tar archive; what >> git tree object would I get if I extract this archive to an empty >> directory and said 'git add . && git write-tree'?". >> >> I agree that one is computable. > > So, I was brainstorming about this today, and I'm curious if you think this > would be a useful feature to have, maybe even natively? > > E.g. here's a scenario: > > "git archive -S <signed-object>" creates an additional file that is added to > the generated tar/zip archive -- for example, a ${prefix}.GIT_ARCHIVE_SIG. That > file contains the raw contents of the signed tag and/or the signed commit. > > "git verify-archive" would look for a toplevel .GIT_ARCHIVE_SIG file. If it's > present, it would verify the signature on these "detached" signed objects to > get a trusted tree hash. Then it would compute the tree hash of the tar > archive (minus the .GIT_ARCHIVE_SIG file) to see if it matches. > > In my mind, that would provide the following benefits over the current > practice of detached .sig files: > > 1. environments like github/git.kernel.org would be able to create verifiable > snapshot archives using an existing set of signed objects > 2. packagers would be able to perform cryptographic verification without > needing to track any extra sources like corresponding .sig files; they > would just need to add a build-time dependency on git (plus whatever it > calls for cryptographic verification, such as gnupg or openssh) > 3. this would automatically support all git-native signature mechanisms like > openssh and whatever else gets added in the future > > Does this idea have any merit, or is it too fragile/crazy to bother? I may choose details differently at implementation level (instead of an extra file, I'd see if we can add it as pax_extended_header, for example), but I think that is workable and might be even useful, provided if I am not misunderstanding your idea, so let me try rephrasing to see how it would work. Given a signed commit or a signed tag that points at a commit, your enhanced "git archive" would create a .tar file with the contents of the tree object, and adds copies signed objects that tells what tree object the archive ought to have. E.g. if you start from a signed tag, "git cat-file tag $tag" output would allow you to learn the object name of the tagged object, and to verify the PGP signature embedded in the tag, but it is likely that the tagged object is a commit, not a tree, so you'd also need to include "git cat-file commit $tag^{commit}". So you'd store the raw contents of the tag (so that we have a hash-protected record of commit object name), and the commit (so that we have a hash-protected record of tree object name). You as the recipient will find these in the tarball: - the files that are supposed to be the contents of tree X. - the raw contents of the commit C that is supposed to record the tree X. - the raw contents of the tag T that is supposed to point at the commit C. Starting from the contents of tag T, which is PGP signed, you know that the signer wanted to call commit C with the name of the tag T. Then the raw contents that alledgedly are from commit C, you can "git hash-object -t commit" it to verify that it indeed hashes down to C (hence, it what the signer wanted to give you), and find the name of the tree object X the commit records. And when you added all the blobs contained in the tarball (and nothing else) to the index and ran write-tree on the resulting index, you would know what tree object the tarball contained, and if it hashes down to X, you know that the cryptographic hash chain starting from PGP signature on T attests that that tarball matches what the signer wanted you to have.