Re: Commit SHA1 == SHA1 checksum?

"Gamblin, Todd" <gamblin2@xxxxxxxx> · Mon, 7 Feb 2022 21:08:30 +0000

> On Mon, Feb 07, 2022 at 08:15:58AM +0000, Gamblin, Todd wrote:
>> In our case, the initial trust doesn’t come from a PGP signature — it comes
>> (at least for now) from having cloned the package repository from GitHub.
> 
> Not really the case, if you're relying on a particular commit hash, as you
> state. Once you specify a target hash, you don't really have to care where the
> repository came from -- the hash is either going to be there and be valid, or
> it's not going to be there.

Not to belabor the point, as I think we agree, but there are two clones going on in my example:

1. Spack the package manager is hosted on GitHub.  You clone the repository and run bin/spack out of the repository to use it.  Users will clone either `develop` (the default branch) or some release branch — but they won’t have a commit hash for that.  This is just how they get the package manager and its built-in package repo in the first place.

2. In the spack repo is a repository full of package descriptions.  Those point to sources for things spack can build, and they may do it by commit hash or by tarball URL and sha256.  If spack sees a source listed by commit hash, spack clones it (at that hash) before building.

In (1), since you do not have a hash, you’re trusting that GitHub gave you the right repo and that the project maintained its branches well.  This is why I called it “initial trust”.  In (2), that trust enables you to have confidence in the hashes in the package.py files.

I think we both agree that if you have a sha1 hash from a trusted source, you can be assured that it’s accurate, regardless of where the repo came from.

-Todd

> On Feb 7, 2022, at 5:15 AM, Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> wrote:
> 
> On Mon, Feb 07, 2022 at 08:15:58AM +0000, Gamblin, Todd wrote:
>> In our case, the initial trust doesn’t come from a PGP signature — it comes
>> (at least for now) from having cloned the package repository from GitHub.
> 
> Not really the case, if you're relying on a particular commit hash, as you
> state. Once you specify a target hash, you don't really have to care where the
> repository came from -- the hash is either going to be there and be valid, or
> it's not going to be there.
> 
> It only matters where the person who picked that hash cloned the repository
> from and what steps they made to verify that it is a legitimate commit. If "I
> cloned this repository from github" is sufficient for your needs, then that's
> fine. The alternative is to use PGP verification, but in either case once you
> pick a hash to use, you can rely on git to do all the rest.
> 
>> That said, I guess I do still have one more question — how soon will git
>> notice that a given repo is corrupted/tampered with (insofar as sha1 can do
>> that)?  On checkout?
> 
> Yes. I've asked this question before as well:
> https://urldefense.us/v3/__https://lore.kernel.org/git/20190829141010.GD1797@xxxxxxxxxxxxxxxxxxxxx/__;!!G2kpM7uM-TzIFchu!gApKmh4RAQ8zueDlHDnRzHBmKpn03CSH9WvjgAk6C4tBa5ZJMwR8GBuro5lsth0vMg$ 
> 
> The relevant bit:
> 
>    Then yes, there is no need to fsck. When the objects were received on
>    the server side (by push) and then again when you got them from the
>    server (by clone), their sha1s were recomputed from scratch, not
>    trusting the sender at all in either case.
> 
> -K