Patch attestation, attempt #2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hello, all:

In continuation of my patch attestation work, I would like to submit for 
your consideration an alternative implementation that is both simpler 
and more likely to find wider adoption due to introduction of
domain-based attestation that is similar to DKIM in nature. While I 
consider domain-based attestation less robust than individual developer 
attestation, it has the upside of providing an easy way to make all 
patches coming from entire domains tamper-evident, once they receive the 
in-header signature. In other words, a patch coming from 
developer@xxxxxxxxxx can be verified to have been mailed via 
and not altered since leaving the SMTP server.

Domain-level attestation works alongside developer-based attestation, 
and can also be done via email headers. It can be done either with PGP, 
or with individual ED25519 keys, distributed via web key directories.

This proposal comes with a proof-of-concept repository implementing all 
the signing and verification functionality. It can be cloned from the 
following address:

Below is the proposal itself, taken from README.rst. All examples 
mentioned in it can be replicated using the POC repository.


Header-Based Patch Attestation
Author: Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx>
Status: Alpha, soliciting comments

Projects participating in decentralized development continue to use
RFC-2822 (email) formatted messages for code submissions and review.
This remains the only widely accepted mechanism for code collaboration
that does not rely on centralized infrastructure maintained by a single
entity, which necessarily introduces a single point of dependency and
a single point of failure.

RFC-2822 formatted messages can be delivered via a variety of means. To
name a few of the more common ones:

  - email
  - usenet
  - aggregated archives (e.g. public-inbox)

Among these, email remains the most widely used transport mechanism for
RFC-2822 messages, most commonly delivered via subscription-based
services (mailing lists).

Email and end-to-end attestation
There are two commonly used standards for cryptographic email
attestation: PGP and S/MIME. When it comes to patches sent via email,
there are significant drawbacks to both:

  - Mailing list software may modify email body contents to add
    subscription information footers, causing message attestation to
  - Attestation via detached MIME signatures may not be preserved by
    mailing list software that aggressively quarantines attachments.
  - Inline PGP attestation generally frustrates developers working with
    patches due to extra surrounding content and the escaping it
    performs for strings containing dashes at the start of the line for
    canonicalization purposes.
  - Only the body of the message is attested, leaving metadata such as
    "From", "Subject", and "Date" open to tampering. Git uses this
    metadata to formulate git commits, so leaving them unattested is
    suboptimal (they can be duplicated into the body of the message,
    but git format-patch will not do this by default).
  - PGP key distribution and trust delegation remains a difficult
    problem to solve. Even if PGP attestation is available, the
    developer on the receiving end of the patches may not make any use
    of it due to not having the sender's key in their keyring.
  - S/MIME certificates are increasingly difficult to obtain for
    developers not working in corporate environments. At the time of
    writing, only two commercial CAs continue to provide this service --
    and only one does it for free.

For these reasons, end-to-end attestation is rarely used in communities
that continue to use email as their main conduit for code submissions
and review.

Email and domain-level attestation
Since unsolicited emails (SPAM) frequently forge headers in order to
appear to be coming from trusted sources, most major service providers
have adopted DKIM (RFC-6376) to provide cryptographic attestation for
header and body contents. A message that originates from will
contain a "DKIM-Signature" header that attests the contents of the
following headers (among others):

  - from
  - date
  - message-id
  - subject

The "DKIM-Signature" header also includes a hash of the message body
(bh=) that is included in the final verification hash. When a DKIM
signature is successfully verified using a public key that is published
via DNS records, this provides a degree of assurance that the
email message has not been modified since leaving

Just as PGP and S/MIME attestation, this has important problems when it
comes to patches sent via mailing lists:

  - If the "sender" header is included in the attestation, the DKIM
    signature will no longer verify due to mailing lists necessarily
    rewriting it for bounce handling.
  - ML software commonly modifies the subject header in order to insert
    list identification (e.g. ``[some-topic]``). Since the "subject"
    header is almost always included into the list of headers attested
    by DKIM, this causes DKIM signatures to fail verification.
  - ML software also routinely modifies the message body for the
    purposes of stripping attachments or inserting list subscription
    metadata. Since the bh= hash is included in the final signature
    hash, this results in a failed DKIM signature check.

Even if all of the above does not apply and the DKIM signature is
successfully verified, body canonicalization routines mandated by the
DKIM RFC may result in a false-positive successful attestation for
patches. The "relaxed" canonicalization instructs that all consecutive
whitespace is collapsed, so patches for languages like Python or GNU
Make where whitespace is syntactically significant may have different
code result in the same hash.

DKIM works well enough for end-to-end email attestation, but has
important drawbacks for domain-level attestation of patches, especially
when they are delivered via mailing lists.

The goal of this document is to propose a scheme that would provide
cryptographic attestation for all message contents necessary for trusted
distributed code collaboration. It draws on the success of the DKIM
standard in order to adapt (and adopt) it for this purpose.

Anatomy of an email patch
A patch submitted via an RFC-2822 formatted message consists of the
following three significant parts:

  - *metadata*, which includes the Author, Email, Subject, and Date of
    the submission
  - *commit message*, which describes what the change is supposed to
  - *diff content*, which is structured data that should be applied
    to the codebase in order to implement the changes proposed

Patch submissions also routinely provide additional content that may
have significance to the author or to the reviewer, but is not preserved
in the codebase after patches are applied, such as:

  - information describing changes between revisions
  - statistics about what files are changed (diffstat)
  - structured data indicating tree dependencies (base-commit)
  - author's signature and software version info
  - mailing list subscription metadata

Our goal is to provide attestation for the significant parts and ignore
the parts that are not preserved after code is committed to a git

Three hashes per patch
Instead of creating a single attestation hash, we create a separate hash
for each meaningful part of the patch submission:

  - i: patch metadata
  - m: commit message
  - p: diff content

This allows the person performing verification to identify which part of
the submission has been altered since being signed. A change to a commit
message may be explained by the addition of a ``Signed-off-by`` (or
similar) trailer, so the developer performing the review may ignore a
failure in the "m" hash if the other two hashes are passing.

Similarly, a patch that goes through a chain of maintainers will
necessarily have its commit message modified by the inclusion of various
provenance trailers. Having a separate hash for the patch content and
patch metadata provides a way to track whether or not any of the
submaintainers made changes to the patch code, or just to the commit
message, as is generally expected.

To generate the three parts, we rely on the ``git mailinfo`` command,
that does most of what we need::

    git mailinfo m p > i < email.msg

The above command will produce three files that closely match what we
are looking for, but require a bit of extra processing to remove content
that is likely to be altered in SMTP transmission.

To get the "m" hash, we take the "m" file as-is::

    sha256sum m

To get the "i" hash, we remove the "Date" header from the output,
because it can be modified by git during format-patch or send-email
stages (or, infrequently, by SMTP relays). We only take the "Author",
"Email", and "Subject" headers::

    egrep '^(Author|Email|Subject)' i | sha256sum

The "p" file requires most work, as it contains data from the "below the
cut" portion of the commit message (usually, diffstat and revision
information), plus trailing content such as signatures or mailing list
subscription info. All of this is stripped away to leave just the diff
content. Unfortunately, there is no way to do it with git itself, so we
use manual parsing of the diff structure to perform this operation.

Why not use git patch-id?
Git provides a command to generate a "patch-id" that can be used to
quickly identify similar patches. To generate the patch-id hash, git
performs several canonicalization routines that make this hash
unsuitable for attestation purposes:

  - it collapses all repeating whitespace
  - it removes all line numbers from diff contents

It is possible for a malicious actor to create two patches that generate
identical patch-id hashes but have drastically different results when
applied to the codebase. For more info, see discussion here:


X-Patch-Hashes header
After the i, m, p hashes are generated, we insert them into the email
message as a separate header. You can use the proof-of-concept code
included to generate one yourself::

    $ ./ hashes-hdr
    Using emails/unsigned.eml as message source
    --- HEADER STARTS ---
    X-Patch-Hashes: v=1; h=sha256; i=pkD5Pg8+cndZAzQQzo3RBSOOUzZM3GYWxiFIKFGIKe0=;

Running POC code
The POC code is written in Python and requires an extra set of libraries
in order to work. To get going, please do the following::

    $ python3 -mvenv .venv
    $ source .venv/bin/activate
    $ pip install --upgrade pip
    $ pip install -r requirements.txt

Domain-level attestation
Once the X-Patch-Hashes header is generated and inserted into the email,
it will need to be signed in order to be useful for attestation
purposes. Adding domain-level signatures during SMTP processing is the
simplest way to accomplish this, as it would allow entire companies to
automatically attest all patches sent out via their infrastructure.

This can be easily done by introducing a patch-attestation milter that
would automatically analyze body contents and generate the
X-Patch-Hashes header if it finds that the message contains a patch
(unless this header is already present). This milter can then either
create its own cryptographic signature or let the usual DKIM-signing
infrastructure create the necessary attestation.

Using vanilla DKIM
Vanilla DKIM is well-suited for this purpose, as it was specifically
created to sign email headers. The following changes will need to be
made to the configuration for it to be useful:

  - add "x-patch-hashes" to the list of signed headers
  - ensure that "sender" is not included
  - potentially, exclude "subject" from the list of signed headers, in
    order to hedge against mailing lists that add ``[topic]`` to all
    email subjects

Here's how it looks with the POC command, using the bundled rsa.key::

    $ ./ sign-dkim
    Signing: plain DKIM
    Using emails/unsigned.eml as message source
    Using rsa.key to sign
    --- MESSAGE STARTS ---
    X-Patch-Hashes: v=1; h=sha256; i=pkD5Pg8+cndZAzQQzo3RBSOOUzZM3GYWxiFIKFGIKe0=;
    DKIM-Signature:  v=1; a=rsa-sha256; c=relaxed/simple;;; q=dns/txt; s=patches; t=1600264001; h=from : date :
     x-patch-hashes; bh=g2Sv1ZR+jIrWukzdXbqb+aeiqyFQOBLDQY6z0BBnGg4=;

Note, that the b= value will be different for you since the timestamp is
included into the hashed content and will be different each time the
code runs.

This header was created by a generic DKIM implementation (dkimpy),
commonly used in production via the popular dkimpy-milter daemon.

This POC also includes a few example emails signed by the DKIM
key. You can run the POC verification yourself::

    $ ./ -m emails/korg-signed-dkim.eml verify
    Using emails/korg-signed-dkim.eml as message source
    Verifying: Plain DKIM
    PASS : identity and domain match From header
    PASS : time drift between Date and t (2 days, 23:24:18)
    PASS : DKIM signature for, s=default
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

As you can see, the verification steps will check several things:

  - that the DKIM signature passes verification (this is done as
    dictated by the RFC -- by normalizing and concatenating all signed
    headers, plus the DKIM-signature header itself, minus the signature
    content following b=)
  - that the x-patch-hashes header is included in the content attested
    by DKIM
  - that the domain (d=) and identity (i=) values match what is in the
    From: field of the email message
  - that time drift between the Date header and the timestamp of the
    signature is reasonable
  - that all patch hashes that we generate match the hashes in the
    signed header

Note, that this check specifically excludes verifying the body hash
(bh=) value, for the reasons described in the previous section
concerning DKIM drawbacks. Also, since we excluded "subject" from the
list of signed headers, the verification will succeed even with usual
mailman-induced changes to the email content::

    $ ./ -m emails/korg-signed-dkim-with-ml-junk.eml verify
    Using emails/korg-signed-dkim-with-ml-junk.eml as message source
    Verifying: Plain DKIM
    PASS : identity and domain match From header
    PASS : time drift between Date and t (2 days, 23:24:18)
    PASS : DKIM signature for, s=default
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

However, since we include the subject of the commit (as git sees it)
into the "i" hash, any changes to the subject header that aren't extra
prefixes like ``[topic]`` will result in verification failure::

    $ ./ -m emails/korg-signed-dkim-changed-subject.eml verify
    Using emails/korg-signed-dkim-changed-subject.eml as message source
    Verifying: Plain DKIM
    PASS : identity and domain match From header
    PASS : time drift between Date and t (2 days, 23:24:18)
    PASS : DKIM signature for, s=default
    ----- ---------------
    FAIL : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    FAIL : Some or all hashes failed verification

Using the X-Patch-Sig header
There may be several reasons why you may not want to use DKIM for the
purpose of attesting the X-Patch-Hashes header:

  - you may not have sufficient control over the infrastructure
    performing DKIM signing, for example if your company uses a
    commercial upstream relayhost that performs DKIM signing for your
  - you may not want to exclude the "subject" header from your DKIM
    configuration, as it reduces the overall scope of your email
  - you may not want to rely on DNS for the purposes of public key
    lookups, since DNS records are easily spoofed (and DNSSec adoption
    is still very low)

For these reasons, we also introduce a separate "X-Patch-Sig" header
that acts as a compatible subset of the DKIM RFC:

  - we only use the "x-patch-hashes" header, omitting the need for the
    h= record, and always normalize it as "relaxed"
  - we omit the bh= field entirely
  - we omit the v= field, since we will rely on the v= value in the
    X-Patch-Hashes header for versioning info
  - we add the m= field to indicate the signature mode (dk, wk, pgp,
    wkd, discussed below)
  - for the purposes of the POC, we hardcode the algorithm to
    ed25519-sha256, though other algorithms like rsa-sha256 or
    rsa-sha512 can be easily implemented

The signature is generated in the exact same way as the DKIM signature,
by concatenating the x-patch-hashes header and the x-patch-sig header
(after normalizing them using the "relaxed" mode), obviously excluding
the content that follows b=.

Here's the result of running the POC code, using the bundled dk.key::

    $ ./ sign-dk
    Signing: X-Patch-Sig header using dk mode
    Using emails/unsigned.eml as message source
    --- MESSAGE STARTS ---
    X-Patch-Hashes: v=1; h=sha256; i=pkD5Pg8+cndZAzQQzo3RBSOOUzZM3GYWxiFIKFGIKe0=;
    X-Patch-Sig: m=dk;;; s=patches; t=1600268242;

DK Mode
The DK mode is fully compatible with the DKIM standard and will perform
the exact same DNS query to look up the public key for the selector

    $ ./ -m emails/korg-signed-dk.eml verify
    Using emails/korg-signed-dk.eml as message source
    Verifying: X-Patch-Sig (mode=dk)
    PASS : identity and domain match From header
    PASS : time drift between Date and t (4 days, 5:56:18)
    PASS : mode=dk signature verified for:,, s=patches
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

WK Mode
Instead of looking up the public key using DNS, we perform a HTTPS
lookup instead. This has the advantages of being more secure, but
requires caching, TTL expiration, and proxy configuration by the client,
plus is more fragile due to the less distributed nature of the web as
opposed to the distributed and fault-tolerant implementation of DNS.

The query is performed to the domain name specified in the signature,
using the following rule::


The contents of the txt file are the same as the contents of the TXT
record. We have it configured for and you can perform a
verification lookup using the provided example::

    $ ./ -m emails/korg-signed-wk.eml verify
    Using emails/korg-signed-wk.eml as message source
    Verifying: X-Patch-Sig (mode=wk)
    PASS : identity and domain match From header
    PASS : time drift between Date and t (4 days, 6:18:45)
    PASS : mode=wk signature verified for:,, s=patches
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

Developer-level attestation
The domain-level attestation has significant advantages, but also
important drawbacks:

 - advantage: it allows auto-enrolling entire companies, without the
   need for individual developers to make any changes to their usual
 - advantage: it piggybacks on the existing DKIM standard, which has
   a proven success record
 - disadvantage: it requires changes to the IT infrastructure, including
   adding a new milter daemon to the authenticated SMTP relay, which has
   security and stability implications
 - disadvantage: it requires explicit trust that the infrastructure
   performing the hashing and signing has not been compromised by
   malicious attackers
 - disadvantage: it allows someone with access to a compromised account
   to send out patches purporting to be coming from an official employee
   of the company
 - disadvantage: it is not useful to unaffiliated developers sending
   patches from generic email addresses (gmail, yahoo, hotmail, etc).

These disadvantages can be mitigated by allowing individual developers
to provide their own signatures, using the "pgp" and "wkd" modes of the
X-Patch-Sig header.

PGP mode
Many open-source projects already provide a mechanism for developers to
exchange and use PGP keys for the purposes of code attestation (e.g. via
signed git tags and git commits). We can easily use GnuPG to provide the
signature content of the X-Patch-Sig header.

Here is an example from the bundled emails/mricon-signed-pgp.eml::

    X-Patch-Hashes: v=1; h=sha256;
    X-Patch-Sig: m=pgp; i=mricon@xxxxxxxxxx; s=0xE63EDCA9329DD07E;

Since a lot of the attesting information is already embedded into the
PGP signature itself, the header structure is different from the "dk" or
"wk" mode:

  - we don't need to know the domain, since we won't be doing any
    lookups on our own (GnuPG can handle this, if configured)
  - the selector field identifies the public key ID of the certification
    subkey, for ease of lookups
  - the identity field is informational only, but can be used by GnuPG
    to perform WKD lookups, if it matches the From header (not
    implemented in the POC)
  - the timestamp field is missing, since this data is embedded into the
    PGP signature itself

On the verification side, if the key specified by the selector is
already present in the verifier's default keyring, we will verify that
the signature is GOOD, VALID, and that it is either TRUST_FULLY or

If the key is not present in the verifier's default keyring, the POC
will check if there is a matching entry in .keys/openpgp/keys/[keyid].asc,
and if so, will use .keys/openpgp/pubring.kbx for performing the
verification. In this case, TRUST_* fields are not used, as they will
always be "unknown".

In-git key distribution is discussed further below.

I wanted to provide a way for developers to use a WK-like mode for
public key lookups as an alternative to PGP. The signature is generated
just like for the domain-level WK mode, using the ed25519 key provided
by each individual developer.

Here's the POC running with the bundled "ingit.key"::

    $ ./ sign-wkd
    Signing: X-Patch-Sig header using wkd mode
    Using emails/unsigned.eml as message source
    --- MESSAGE STARTS ---
    X-Patch-Hashes: v=1; h=sha256; i=pkD5Pg8+cndZAzQQzo3RBSOOUzZM3GYWxiFIKFGIKe0=;
    X-Patch-Sig: m=wkd;; i=dev@xxxxxxxxxx; s=patches; t=1600270651;

It is very similar to content created in the "dk" or "wk" mode, except
the identity field includes the entire email address of the developer.

When we verify the attestation, we will do the following:

  - check if that key is available in .keys/devkey/[domain]/[local]/[selector].txt
  - if it is not present, we perform a https query to

The hashing and zbase32-encoding is taken to be compatible with
openpgp's WKD implementation and is done to prevent someone from easily
finding out everyone's email addresses from unprotected directory

You can run the verification using the POC example. Here's the run
without using the in-git matching key::

    $ ./ -m emails/mricon-signed-wkd.eml verify
    Using emails/mricon-signed-wkd.eml as message source
    Verifying: X-Patch-Sig (mode=wkd)
    PASS : identity and domain match From header
    PASS : time drift between Date and t (4 days, 6:58:47)
    PASS : mode=wkd signature verified for:, i=mricon@xxxxxxxxxx, s=patches
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

Here is the same, but using the public key provided in the git
repository itself::

    $ ./ -m emails/dev-signed-wkd-ingit.eml verify
    Using emails/dev-signed-wkd-ingit.eml as message source
    Verifying: X-Patch-Sig (mode=wkd)
    Loading: WKD key from /var/home/user/work/git/patch-attestation-poc/.keys/devkey/
    PASS : identity and domain match From header
    PASS : time drift between Date and t (4 days, 7:28:47)
    PASS : mode=wkd signature verified for:, i=dev@xxxxxxxxxx, s=patches
    ----- ---------------
    PASS : metadata
    PASS : commit message
    PASS : diff content
    ----- ---------------
    PASS : All hashes verified

The structure and nature of the WKD mechanism is entirely up for
discussion (along with everything else in this proposal).

Automating developer attestation
The easiest way to automate developer attestation is by providing a
sendmail-compatible "attest-and-send" utility that can be a drop-in
command settable via git's sendemail.smtpServer config setting. It would
be automatically invoked whenever git-send-email runs and would inject
the X-Patch-Hashes and X-Patch-Sig headers before sending the emails to
the SMTP server specified via the rest of the sendemail configuration

In addition to creating these headers, this tool can also automatically
add all emails going through it to the developer's personal public-inbox
archive that can act as a separate source of patch data in addition to
mail delivered via SMTP and mailing lists.

Public keys bundled with git repos
Delegated trust is hard and securely bootstrapping your trusted
identities is even harder. There are existing proposals to include
developer keys as part of the git repository itself in order to make it
possible for someone to quickly bootstrap their keyring with trusted
identities. Obviously, this introduces a chicken-and-egg problem of
getting your source of trust from the thing you're trying to attest in
the first place. However, no mechanism short of in-person meetings is
able to provide perfect levels of assurance, so in-git key distribution
remains as good a source of bootstrap trust as any.

The implementation in this POC is naive and shouldn't be used for
serious purposes. An emerging proposal like did:git
( is
a more thoroughly considered approach and should probably be preferred.

Where should verification be performed
Signature verification should be performed by the maintainer evaluating
the patches they received for inclusion into the git repository. The POC
already pulls in "b4" as a dependency for the patch hashing routines,
and I intend to add the header-based verification mechanisms in the
future release of b4, once this proposal is thoroughly discussed.

Similarly, browser and other email client plugins can be written to
indicate to the developer whether the patches they are viewing pass
signature verification. If this proposal is adopted, we can come up with
implementations for Gmail, Mutt and Emacs, which should cover a
significant number of end-user tools.

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux