First off, thanks for the awesome response, Peff, and Sverre and Michael as well. Great stuff, and plenty that I had not thought of. On Wed, May 4, 2011 at 4:42 AM, Jeff King <peff@xxxxxxxx> wrote: > On Tue, May 03, 2011 at 07:36:51PM -0400, Richard Peterson wrote: > >> Here are some possible semantics you could assign to signing a commit hash: >> >> * Making a verifiable claim of authorship of a commit >> * Making a verifiable claim to have reviewed a commit or set of commits >> * Making a verifiable claim to have approved a commit or set of commits for >> some purpose >> * Making some other verifiable claim about a commit TBD by your workflow >> * Making a verifiable claim to have reviewed or approved the entire tree >> under the commit > > Yeah, all of those make sense in certain workflows. But with the > exception of authorship verification, they are not things you would want > to do at _commit_ time, Even authorship could be claimed after commit time too, for that matter. > but rather something you say later about a > commit. So I think fundamentally you are not interested in adding > signatures to git commits themselves, but rather about making statements > about commits that happen to be signed. Which is good, because your > problem is much easier. :) > > The nice thing is that git gives you a stable, cryptographically > verifiable identifier for the commit. So all you have to do is mention > it along with some metadata, sign it, and then store it somewhere. > > The first two parts can be as simple as something like: > > (git rev-parse --verify HEAD > echo "I reviewed this and it meets some standard X." > ) | gpg --sign > > where probably you would want to define some kind of parsable metadata > format for your particular workflow. > > For storage, you basically have three options: > > 1. Somewhere completely outside of git. There's no reason this needs > to be stored in git at all, depending on your workflow. It may be > simpler to keep it in some database related to your review system > (in fact, you may not doing anything cryptographic at all, but > merely have a separate review system with a central database that > mentions commits by sha1). I see this as a useful option considering some poor souls in my organization use Subversion, and we could factor out the audit / review workflow to not depend on a single version control system. On the other hand, it makes sense to keep data actually within git if it uses a git internal identifier as a key, and its useful to operate on it with the git tool set. > > 2. In git tags. You can already do this with: > > git tag -s -m "I reviewed this" HEAD > > But tags aren't a good fit for a workflow that signs every commit > (some of them perhaps even multiple times!). You end up with lots > of tag refs. Right - one of the reasons I don't like tags for this. Tags really just don't fit the bill, unfortunately. > > 3. In git notes. You can do something like: > > (git rev-parse --verify HEAD > echo "I reviewed this" > ) | gpg --sign -a | > git notes add -F - HEAD > > though you'd probably want to be a little more complex, and handle > lists of signed notes for each commit. And you may want to store > these in a separate notes ref from the default one. I had looked at this option, but had failed to see the usefulness of using a different ref. I was worried about cluttering things up, overloading the intended purpose of notes, and so forth. I wasn't really sure if notes were intended to be general purpose storage for systematic, structured data. My inclination was to do this outside notes, or even in a parallel implementation to notes, factoring out the common parts. I suppose that looking at notes as somewhat of a free-for-all obviates this need. Is this really what notes are for? > > The advantage of notes are that they are designed for lots of > per-commit storage, and can be accessed fairly efficiently. That was my other concern about notes - performance. Not sure how notes are stored, but I certainly trust you that they're efficient. > > So now you have your review storage system (or authorship, or whatever > metadata you want to stick in there). You can peek at it manually, of > course, when you suspect something is not right. But you probably also > want to do automatic things, like making sure nothing goes into some > branch "foo" that isn't signed with an authorship note. > > Assuming you are storing with git notes (if you are using some external > system, replace the call to git-notes below with whatever database > lookup you would want), you could use a pre-receive hook that did > something like: > > git rev-list $old..$new | > while read commit; do > git notes show $commit >tmp > gpg --verify tmp >data 2>siginfo || die "$commit: signature is bad" > # ugh, is there really no better way to get this info from gpg? See? We need functions for this stuff! I'll share whatever I come up with, and maybe it will be useful in general. > perl -lne 'print $1 if /Good signature from "(.*)"/ siginfo >signer > git show --format="%an <%ae>" $commit >author > cmp author signer || die "$commit: signer and committer don't match" Yes this needs to be handled robustly. The signer would need to be told at sign-time if his signature didn't match. > test "`head -n 1 data`" = $commit || > die "$commit: signed commit does not match" > done > > And obviously that is hacked together and you would want something more > robust, Thank you - this is all solid stuff to get me started. > and you'd need to handle the web of trust for the signing keys > somehow (though I think that is external to this script, and is about > setting up the desired keyring). But I hope it gives a sense of what you > can do. You could also replace gpg completely with something like > openssl using x.509 certs, if that makes more sense to your > organization. You read my mind. Everybody in my organization has a set of x.509 certs on smartcards. That's phase 2 of my project. > > Developers would have to make a note and push their notes tree first, You mean for hook / verification purposes? Or is there some underlying reason to push notes first? > and then push their actual commits into a branch (and you might want to > do some verification on the notes they push, like checking that entries > for commit $X actually contains signatures for $X, or that the signer > identity matches some ssh credential, or that the pusher isn't deleting > any signatures or erasing note history). > > I suspect you already thought through some of this already. But I wanted > to start with first principles, because I really don't think this is a > _git_ problem as much as it is a _workflow_ problem. So it's important > to first define the workflow you want, and then think about how git can > help. Stable commit identifiers already provide much of the basis. I > think notes provide a nice storage format that is efficient and > push-able to other repos (though in a centralized shop, some other > database might make sense, too). What really remains to be done is: > > 1. Define the metadata format that encapsulates what you want to say > about commits. > > 2. Write scripts to help developers and reviewers make these notes, > and verify them. Write hooks to implement policy on letting > commits into certain branches, as above. > > And both of those happen outside of git (though if you write them in a > generic enough form, I'm sure people on the list would be very happy to > see them shared). I'll be sure to share. > >> There are 200 developers working on a financial trading system, and each of >> them has the opportunity to slip malicious code into the project. When the >> final release is prepared, the project lead signs the tip commit, thus >> signing the whole tree. Now it is discovered that someone did slip some >> malicious code in. How do you audit the system? Could higher levels of >> individual accountability have discouraged this scenario? > > I like this example. It shows that signing a commit is not really > meaningful by itself; you have to understand the semantics of that > signature (and maybe they're included as comments in the tag object, or > maybe it is assumed by your organization's workflow). > > In the case of the kernel, Linus signing a commit with a tag implicitly > means "I think what is in this tree and everything before it is good, so > you should feel comfortable using it" (or at least insofar as you trust > Linus). > > But it doesn't have to be that way. Your project lead signing may mean > "this is good and we should ship it". But developers signing commits may > simply mean "I promise that I wrote the changes between this commit's > tree and its parent". Those are all signatures of commits, but they mean > very different things; the key is adding metadata to know which is > which. > >> I've seen it argued that a proper SSH setup and user management are the key. >> These are good for security and access control, but not for some durable >> form of accountability. > > Right. You are trusting the server's records, not cryptography. The main > advantage is that it's efficient and easy to set up. :) The main reason this doesn't work for me is that codebases are passed around my organization like hand-me-down clothes. It is not unheard of to get the entire repository for a critical application delivered from one shop to another on CD. We need to be able to verify the integrity of a repository entirely independently of any outside information. The only centralized source of trust in our organization is the certificate authority. Now my big question to ponder: what do do when the CA expires a cert? Hmm... > >> It seems that creating a signed tag is the same as signing a commit. There >> are a few problems, though. Tags don't provide a secure means of asserting >> the type of signature being applied to the commit hash. That is - is the >> hash signed because someone is claiming authorship? Because they are >> asserting the integrity of the entire tree? Because they have reviewed the >> code? Because they reviewed a certain subset of the tree? Of course there's >> also the issue that tags live in a cluttered namespace. Signing a commit is >> essentially a different thing from providing a name for a commit. Using tags >> just to sign commits requires a glut of tag names. > > Again, metadata. Say what you mean in the free-form content of the tag. > For the kernel, there is nothing to be said. Linus signing tags has a > well-known meaning in the community. But in an organization signing for > a lot of different reasons, you would want the signed data to say why it > was signed. > >> I propose expanding the concept of tags, or alternately creating a new >> concept which subsumes the existing tag concept. I'll call this new concept >> a "sig" for the purposes of this discussion. The concept of a sig cross-cuts >> the concept of a tag. >> >> A tag signs the commit hash. A sig signs a SHA1-based absolute commit >> reference with a (possibly null) string concatenated to it. For instance, a >> sig might sign the following string: > > A tag can already include arbitrary data. > > In fact, tags basically do what you want already; it's just that storing > one tag ref per commit is going to be ugly. It might make sense to > replace the ad-hoc gpg signatures I used in my examples above with tag > objects, and then store the tag object in the notes tree. > >> "0b9deecf625677cf44058a42c2abd7add5167e81^0 author" >> which would mean that the signor is claiming authorship of that individual >> commit. (Suggestions for notating a single commit are welcome. "^0" seemed >> natural.) > > See? You're defining metadata now. :) > >> * What on earth does it mean to tag a range of commits? With commit ranges >> being siggable, and tags containing sigs, what does it mean to tag a range >> of 10 commits, for instance? Is that desirable? Does it make any sense >> whatsoever? Does it hurt anything if it happens? > > It's slightly more efficient. If I wrote 10 commits, I can either sign > each individually saying "I wrote this", or I can make a single > signature showing them all. The tradeoff is that parsing and verifying > metadata becomes a lot more complex. But crytographically speaking, a > range is not ambiguous; > >> * Performance? I think it would be extremely quick to verify a bunch of >> sigs, but I don't know. Maybe I'm not thinking clearly about it. >> Fortunately, sigs can be ignored entirely and need not affect things. > > Compared to usual git operations, no, it's not quick. But you don't have > to verify all the time. You can verify commits when they enter your > repo, or when you're interested in some aspect of them, or when you > suspect something fishy is going on. You don't have to do it on every > rev-list. Good point. I had thought it would be something to see every time I run git-log, but I suppose it makes perfect sense to do this thing in the nightlies or some other rarer occasion. Thanks, Richard Peterson -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html