Re: Commit SHA1 == SHA1 checksum?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks everyone for the responses.  I think Junio’s comment summed up what I was asking nicely:

> As much or as little trust you have
> in SHA-1 in validating tarball.tar with its known SHA-1 checksum,
> you can trust to the same degree that the commit that is pointed by
> a tag is what the person who signed (with GPG) the tag wanted the
> tag to point at, and in turn the trees and blobs in that commit are
> what the signer wanted to have in that tagged commit, ad infinitum,
> in the space dimension.  At the same time, a commit object records
> the hash of the commit objects that are its parents, the whole
> history of the project going back to inception can be trusted to the
> same degree in the time dimension.

In our case, the initial trust doesn’t come from a PGP signature — it comes (at least for now) from having cloned the package repository from GitHub. So you trust GitHub by cloning over https, and you trust the Spack maintainers to only merge safe things into `develop` or some release branch.  A package description in the repo might have some versions specified by commit, like this:

	https://raw.githubusercontent.com/spack/spack/5ff72ca/var/spack/repos/builtin/packages/acts/package.py

We use the specified commits to build packages from source, or to create (hopefully) reproducible binary packages. Anyway, like the PGP case, you’re given a commit hash from some trusted source.  The question was really whether you can rely on the hash like a sha1 checksum — and it seems like you can.

That said, I guess I do still have one more question — how soon will git notice that a given repo is corrupted/tampered with (insofar as sha1 can do that)?  On checkout?

RE: Johannes:
> (How could you even contemplate that it does not? It is the most obvious way to protect the cloner.)

I have always assumed this was the case but never could find anything in the docs saying explicitly what Junio said above.  It is hard for me to imagine git *not* working this way, but I’ve been asked this question by enough of our package maintainers that I thought I’d bring it up here.

RE: Phil:

> Hopefully Todd will be able to clarify if that 'archive vs tag' cross
> check was part of the question, or whether it was primarily focussed on
> the internally Git checks during for correctness during clone and fsck.

It wasn’t part of the original question — I was really just asking whether git guarantees that a fresh `git clone` of some commit actually has the stated commit hash.  I realize there’s no relation between the sha1 of a commit and the sha1 or any other hash of its tarball (it’d be a pretty bad hash function if there was).

That said, we are still trying to work out some practical *and secure* way to mirror git commits as a simple download.  I think we need to generate the tarballs ourselves and just add their sha256’s to the package — GitHub does this, and their archive generation logic has changed in the past as Junio described below.  It’s messy b/c it requires another checksum that may change, but I don’t see a way around it.  We can’t just tar up a git repo - tar and other compression tools can have vulnerabilities and we want to checksum any input we pass to them.

Thanks again for all the helpful responses.

-Todd



> On Feb 6, 2022, at 1:33 PM, Philip Oakley <philipoakley@iee.email> wrote:
> 
> On 06/02/2022 20:02, Junio C Hamano wrote:
>> Philip Oakley <philipoakley@iee.email> writes:
>> 
>>> I think part of Todd's question was how the tag and uncompressed archive
>>> 'checksums' (e.g. hashes) relate to each other and where those
>>> guarantees come from.
>> There is no such linkage, and there are no guarantees.  The trust
>> you may or may not have on the PGP key that signs the tag and the
>> checksums of the tarball is the only source of such assurance.
>> 
>> More importantly, I do not think there can be any such linkage
>> between the Git tree and release tarball for a few fundamental
>> reasons:
>> 
>> * We add generated files to "git archive" output when creating the
>>   release tarball for builder's convenience, so if you did
>> 
>>       rm -fr temp && git init temp
>>       tar Cxf temp git-$VERSION.tar
>>       git -C temp add . && git -C temp write-tree
>> 
>>   the tree object name that you get out of the last step will not
>>   match the tree object of the version from my archive (interested
>>   parties can study "make dist" for more details).
>> 
>> * Even if we did not add any files to "git archive" output when
>>   creating a release tarball, a tarball that contains all the
>>   directories and files from a given git revision is *NOT* unique.
>>   We do not add randomness to the "git archive" output, just to
>>   make them unstable, but we have made fixes and improvements to
>>   the archive generation logic in the past, and we do reserve the
>>   rights to do so in the future.  And it is not just limited to
>>   "git archive" binary, but how it is driven, e.g. "tar.umask"
>>   settings can affect the mode bits.
> Thanks for the clarification.
> 
> Thus what trust their is, is via the two PGP signatures, rather than
> directly between the tarball and the git repo.
> --
> Philip
> 





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux