Re: [Tagging Commits] feedback / discussion request

Jeff King <peff@xxxxxxxx> · Wed, 4 May 2011 04:42:13 -0400

On Tue, May 03, 2011 at 07:36:51PM -0400, Richard Peterson wrote:

> Here are some possible semantics you could assign to signing a commit hash:
> 
> * Making a verifiable claim of authorship of a commit
> * Making a verifiable claim to have reviewed a commit or set of commits
> * Making a verifiable claim to have approved a commit or set of commits for
> some purpose
> * Making some other verifiable claim about a commit TBD by your workflow
> * Making a verifiable claim to have reviewed or approved the entire tree
> under the commit

Yeah, all of those make sense in certain workflows. But with the
exception of authorship verification, they are not things you would want
to do at _commit_ time, but rather something you say later about a
commit. So I think fundamentally you are not interested in adding
signatures to git commits themselves, but rather about making statements
about commits that happen to be signed. Which is good, because your
problem is much easier. :)

The nice thing is that git gives you a stable, cryptographically
verifiable identifier for the commit. So all you have to do is mention
it along with some metadata, sign it, and then store it somewhere.

The first two parts can be as simple as something like:

  (git rev-parse --verify HEAD
   echo "I reviewed this and it meets some standard X."
  ) | gpg --sign

where probably you would want to define some kind of parsable metadata
format for your particular workflow.

For storage, you basically have three options:

  1. Somewhere completely outside of git. There's no reason this needs
     to be stored in git at all, depending on your workflow. It may be
     simpler to keep it in some database related to your review system
     (in fact, you may not doing anything cryptographic at all, but
     merely have a separate review system with a central database that
     mentions commits by sha1).

  2. In git tags. You can already do this with:

       git tag -s -m "I reviewed this" HEAD

     But tags aren't a good fit for a workflow that signs every commit
     (some of them perhaps even multiple times!). You end up with lots
     of tag refs.

  3. In git notes. You can do something like:

       (git rev-parse --verify HEAD
        echo "I reviewed this"
       ) | gpg --sign -a |
       git notes add -F - HEAD

     though you'd probably want to be a little more complex, and handle
     lists of signed notes for each commit. And you may want to store
     these in a separate notes ref from the default one.

     The advantage of notes are that they are designed for lots of
     per-commit storage, and can be accessed fairly efficiently.

So now you have your review storage system (or authorship, or whatever
metadata you want to stick in there). You can peek at it manually, of
course, when you suspect something is not right. But you probably also
want to do automatic things, like making sure nothing goes into some
branch "foo" that isn't signed with an authorship note.

Assuming you are storing with git notes (if you are using some external
system, replace the call to git-notes below with whatever database
lookup you would want), you could use a pre-receive hook that did
something like:

  git rev-list $old..$new |
  while read commit; do
    git notes show $commit >tmp
    gpg --verify tmp >data 2>siginfo || die "$commit: signature is bad"
    # ugh, is there really no better way to get this info from gpg?
    perl -lne 'print $1 if /Good signature from "(.*)"/ siginfo >signer
    git show --format="%an <%ae>" $commit >author
    cmp author signer || die "$commit: signer and committer don't match"
    test "`head -n 1 data`" = $commit ||
      die "$commit: signed commit does not match"
  done

And obviously that is hacked together and you would want something more
robust, and you'd need to handle the web of trust for the signing keys
somehow (though I think that is external to this script, and is about
setting up the desired keyring). But I hope it gives a sense of what you
can do. You could also replace gpg completely with something like
openssl using x.509 certs, if that makes more sense to your
organization.

Developers would have to make a note and push their notes tree first,
and then push their actual commits into a branch (and you might want to
do some verification on the notes they push, like checking that entries
for commit $X actually contains signatures for $X, or that the signer
identity matches some ssh credential, or that the pusher isn't deleting
any signatures or erasing note history).

I suspect you already thought through some of this already. But I wanted
to start with first principles, because I really don't think this is a
_git_ problem as much as it is a _workflow_ problem. So it's important
to first define the workflow you want, and then think about how git can
help. Stable commit identifiers already provide much of the basis. I
think notes provide a nice storage format that is efficient and
push-able to other repos (though in a centralized shop, some other
database might make sense, too). What really remains to be done is:

  1. Define the metadata format that encapsulates what you want to say
     about commits.

  2. Write scripts to help developers and reviewers make these notes,
     and verify them.  Write hooks to implement policy on letting
     commits into certain branches, as above.

And both of those happen outside of git (though if you write them in a
generic enough form, I'm sure people on the list would be very happy to
see them shared).

> There are 200 developers working on a financial trading system, and each of
> them has the opportunity to slip malicious code into the project. When the
> final release is prepared, the project lead signs the tip commit, thus
> signing the whole tree. Now it is discovered that someone did slip some
> malicious code in.  How do you audit the system? Could higher levels of
> individual accountability have discouraged this scenario?

I like this example. It shows that signing a commit is not really
meaningful by itself; you have to understand the semantics of that
signature (and maybe they're included as comments in the tag object, or
maybe it is assumed by your organization's workflow).

In the case of the kernel, Linus signing a commit with a tag implicitly
means "I think what is in this tree and everything before it is good, so
you should feel comfortable using it" (or at least insofar as you trust
Linus).

But it doesn't have to be that way. Your project lead signing may mean
"this is good and we should ship it". But developers signing commits may
simply mean "I promise that I wrote the changes between this commit's
tree and its parent". Those are all signatures of commits, but they mean
very different things; the key is adding metadata to know which is
which.

> I've seen it argued that a proper SSH setup and user management are the key.
> These are good for security and access control, but not for some durable
> form of accountability.

Right. You are trusting the server's records, not cryptography. The main
advantage is that it's efficient and easy to set up. :)

> It seems that creating a signed tag is the same as signing a commit.  There
> are a few problems, though.  Tags don't provide a secure means of asserting
> the type of signature being applied to the commit hash. That is - is the
> hash signed because someone is claiming authorship? Because they are
> asserting the integrity of the entire tree? Because they have reviewed the
> code? Because they reviewed a certain subset of the tree? Of course there's
> also the issue that tags live in a cluttered namespace. Signing a commit is
> essentially a different thing from providing a name for a commit. Using tags
> just to sign commits requires a glut of tag names.

Again, metadata. Say what you mean in the free-form content of the tag.
For the kernel, there is nothing to be said. Linus signing tags has a
well-known meaning in the community. But in an organization signing for
a lot of different reasons, you would want the signed data to say why it
was signed.

> I propose expanding the concept of tags, or alternately creating a new
> concept which subsumes the existing tag concept. I'll call this new concept
> a "sig" for the purposes of this discussion. The concept of a sig cross-cuts
> the concept of a tag.
> 
> A tag signs the commit hash. A sig signs a SHA1-based absolute commit
> reference with a (possibly null) string concatenated to it. For instance, a
> sig might sign the following string:

A tag can already include arbitrary data.

In fact, tags basically do what you want already; it's just that storing
one tag ref per commit is going to be ugly. It might make sense to
replace the ad-hoc gpg signatures I used in my examples above with tag
objects, and then store the tag object in the notes tree.

> "0b9deecf625677cf44058a42c2abd7add5167e81^0 author"
> which would mean that the signor is claiming authorship of that individual
> commit. (Suggestions for notating a single commit are welcome. "^0" seemed
> natural.)

See? You're defining metadata now. :)

> * What on earth does it mean to tag a range of commits? With commit ranges
> being siggable, and tags containing sigs, what does it mean to tag a range
> of 10 commits, for instance? Is that desirable? Does it make any sense
> whatsoever? Does it hurt anything if it happens?

It's slightly more efficient. If I wrote 10 commits, I can either sign
each individually saying "I wrote this", or I can make a single
signature showing them all. The tradeoff is that parsing and verifying
metadata becomes a lot more complex. But crytographically speaking, a
range is not ambiguous;

> * Performance? I think it would be extremely quick to verify a bunch of
> sigs, but I don't know. Maybe I'm not thinking clearly about it.
> Fortunately, sigs can be ignored entirely and need not affect things.

Compared to usual git operations, no, it's not quick. But you don't have
to verify all the time. You can verify commits when they enter your
repo, or when you're interested in some aspect of them, or when you
suspect something fishy is going on. You don't have to do it on every
rev-list.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html