Re: how to (integrity) verify a whole git repo

Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> · Tue, 21 Apr 2020 12:19:56 -0400

On Tue, Apr 21, 2020 at 04:42:16PM +0200, Christoph Anton Mitterer wrote:
> Taking again the kernel as an example:
> If I clone the repo (or fsck it later), than all I know is that there
> was no corruption, if the all the tips are correct, since they start
> the chain of hash sums to all other objects.

Notably, there is normally only one branch in torvalds/linux.git, and 
that's "master". So, there's only one tip.

> But an attacker could have just forged these tips.
> So for checking authenticity, I need to verify some signatures on them
> 
> Now if I check e.g. Linus signature on tag v5.6; I should know that
> everything earlier (in the tree, not chronologically) to that tag are
> authentic.

Yes, verifying a signature on a tag tells you that all commits are 
bit-for-bit exactly the same as on Linus's workstation where he created 
the signature.

> But not e.g. any commits on top of v.5.6 (which aren't either signed
> themselves or protected by another tag "above" them).

This is mostly true, yes.

> Neither any commits never reached from v.5.6, e.g. later stable patches
> like anything from above v.5.5 (which is again below v.5.6) up to 
> v.5.5.13, which is not.

Stable commits would be in the stable tree, and those tags are signed by 
Greg Kroah-Hartman.

> So from my understanding, to use only commits that are authentic by the
> kernel upstream developers, I'd need verify all these tips.. and throw
> away everything which is not reachable by one of them.
> 
> Is that somehow possible?

You probably don't care about commits that arrive between releases, so 
effectively you are already doing that? Even if you have loose objects 
that aren't reachable from your current tip (e.g. you only care about 
objects in the stable branch linux-5.6.y), it's not like they are going 
to "poison" your tree, so removing them is just a garbage collection 
operation at best.

## Minor attestation rant

I would argue that your premise of "authenticity" is wrong. The best 
that we are currently able to offer is a guarantee that, at the point 
where the tag was signed, the tree is bit-for-bit exact to the tree the 
way it exists on Linus Torvalds' (or Greg KH's) workstation.

However, both Linus and Greg merge code from tens of thousands of other 
contributors and it's important to keep in mind that their tag 
signatures do not offer any kind of attestation proof of the code's 
actual authorship or origin. Looking for such proof would be 
near-impossible -- even if we had a universally accepted mechanism to do 
cryptographic attestation of all patches and commits, normal maintainer 
operations would necessarily break this chain:

- maintainers insert their own trailers into commit messages
  (Signed-off-by, Tested-by, Acked-by, etc).
- maintainers reorder and edit patches that they receive from individual 
  contributors -- for typos, minor stylistical cleanups, extra comments, 
  etc.
- maintainers routinely rebase patches they receive before they can 
  submit them to be merged into mainline.

Full code attestation is possible in projects where all commits are 
forks and merges -- for example, many Git**b/Gerrit projects could be 
set up to require full cryptographic attestation of commits, if all 
operations are forks, pull requests, and merges. However, it would be 
impossible to force this development paradigm onto the Linux kernel -- 
it would be extremely disruptive and require massive individual effort 
to overhaul every maintainer's workflow. Furthermore, many maintainers 
would reject this approach because they would disagree about the main 
premise behind the effort -- that cryptographically signing every commit 
offers enough tangible benefit to be worth it.

Let me expound on the last point. There are some 15,000 personas who 
have committed code to the Linux kernel (a persona could be the same 
person committing code from different commercial entities -- 
jdoe@xxxxxxxxxx vs jdoe@xxxxxxxxxx). Even if we assume that each commit 
is signed, we then must have a way to perform some kind of meaningful 
verification, right?

- Where do we get all the public keys required for such a task?
- How do we handle cases where a key has expired or worse, has been 
  revoked by the developer? This can't invalidate their past commits, 
  because it's impossible to re-sign those.
- How do we bootstrap distributed trust without relying on someone being 
  a Fundamentally Non-corruptible Person? It's certainly not me -- I 
  have close relatives living under, shall we say, regimes with loose 
  standards when it comes to personal freedoms.
- How much trust should we be putting into cryptographic signatures?  
  Linux developers aren't necessarily that much better about keeping 
  their workstations protected against malicious attacks, so they are 
  just as vulnerable to having their private keys stolen as anyone else.

For this reason, Linux maintainers use either a zero-trust approach, or 
a last-leg trust approach:

- Submaintainers don't put much trust into *who* wrote the code and 
  review all submissions they receive as potentially containing security 
  bugs (intentional or not); their job is to review the code and pass it 
  up the chain to maintainers.
- if maintainers receive pull requests from submaintainers, then they 
  *may* check cryptographic signatures on the trees they pull. I am 
  trying to encourage all maintainers to do this, and I've been working 
  to introduce patch attestation so that maintainers preferring to work 
  with patch series as opposed to pull requests can have similar 
  functionality.
- Linus checks all signatures on trees he pulls from non-kernel.org 
  locations. Unfortunately, I've not been able to convince him that he 
  should check them on stuff he pulls from kernel.org as well (and he 
  has his own reasons for that).

So, all of this is to say that as the person cloning linux.git you are 
merely the last link in the chain of "trusting the maintainer before 
you." In your case that maintainer is Linus (or Greg KH), and you have 
to agree that, in the end, "having a tree that is bit-for-bit identical 
with what Linus has" is a pretty good assurance that it's as "authentic 
Linux" as it gets.

-K