Hi, all: I've been mulling over the idea of submitting and verifying cryptographic patch attestation data in a way that would be both useful and unobtrusive -- and I would like to propose a mechanism and a proof-of-concept script for your discussion. # TL;DR 1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git 2. cd korg-helpers 3. ./get-lore-mbox.py 20200225051307.6401-1-keescook@xxxxxxxxxxxx -aT 4. gpg --locate-keys kees@xxxxxxxxxx 5. ./attest-patches.py -c v4_20200224_keescook_chromium_org.mbx -tF PASS | [PATCH v4 1/6] x86/elf: Add table to document READ_IMPLIES_EXEC PASS | [PATCH v4 2/6] x86/elf: Split READ_IMPLIES_EXEC from executable GNU_STACK PASS | [PATCH v4 3/6] x86/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces PASS | [PATCH v4 4/6] arm32/64, elf: Add tables to document READ_IMPLIES_EXEC PASS | [PATCH v4 5/6] arm32/64, elf: Split READ_IMPLIES_EXEC from executable GNU_STACK PASS | [PATCH v4 6/6] arm64, elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces --- All patches passed attestation: Attestation-by: Kees Cook <kees@xxxxxxxxxx> (pgp:8972F4DFDC6DC026) # Preamble PGP is already commonly used by the kernel.org community for signing and verifying git tags when working with pull requests. However, even though PGP was developed with email in mind, it is rarely used for attesting patches submitted via email, for a number of good reasons: - PGP support in email clients remains poor - inline PGP is generally disliked because it interferes with the code review process (and can mess up patches due to needing to escape single dashes) - MIME-wrapped PGP frequently doesn't survive the trip due to being mangled by MTAs, mailing list software, etc, and may also interfere with the code review process due to mime-wrapping actual patches - delegated trust is a hard problem to solve and doesn't easily scale up to the level of thousands of contributors For this reason, PGP-based attestation of emailed patches has never really taken off in the kernel community (or, in fact, in most other communities using email-based workflows). # Proposal I would like to propose integrating PGP-based patch attestation into the tooling we are building to automate a lot of tasks performed by the kernel.org maintainer community. As some of you know, I've put together get-lore-mbox, which is a client-side tool written with the goal to make it easier to retrieve patches from the lore.kernel.org archival system. While get-lore-mbox remains a work in progress, it has been generally well received by the kernel development community, and its usefulness should only grow as it is honed to handle various corner-cases around this patch workflow. I suggest that we start using lore.kernel.org as a mechanism to record and retrieve patch attestation data, using a special pseudo-list I set up for this purpose: https://lore.kernel.org/signatures/. ## Main concepts In each patch submission there are three distinct parts of interest: - basic patch metadata (author, title, date) - patch description, which becomes the git commit message - the patch itself There is frequently other information included with the patch that is largely superfluous (diffstat, series versioning info, etc). It is of no particular interest to us because it does not become part of the git commit and therefore is not useful for attestation purposes. This separation into three component parts is important, because patch description is routinely edited by maintainers to add various trailer metadata (Signed-Off-By, Acked-By, etc). Patches may also end up edited for typos, formatting, or ease of merging, so what ends up committed into git repositories by maintainers may differ from patches as sent to the mailing lists. ### Three hashes per patch If you look at the contents of the patch attestation message (https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/), you will notice a yaml-style formatted document with a series of three hashes. Let's take the first one as example: 2a02abe0-215cf3f1-2acb5798: i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0 p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865 The source of these hashes is the following patch: https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@xxxxxxxxxxxx/ To split the patch into its three components we can use the following command: curl -s https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@xxxxxxxxxxxx/raw \ | git mailinfo m p > i The three files are: m: the commit message p: the patch (plus superfluous surrounding content) i: author, email, subject, date We can immediately calculate the m hash, as it requires no munging: $ sha256sum m 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0 m To calculate the "p" hash, we first need to remove any surrounding junk that isn't just the patch itself. The goal is to get the exact same content as produced by "git diff". If we remove all lines preceding the first "diff" and then everything following the content of the last hunk, we get the p hash: $ vi p $ sha256sum p 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865 p We cannot use the contents of the "i" file verbatim because it includes the Date: header, which is modified by git-send-email. We therefore only hash the Author, Email, and Subject lines: $ egrep '^(Author|Email|Subject)' i | sha256sum 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e - We then take the first 8 characters of the i-m-p hash to create the attestation-ID: 2a02abe0-215cf3f1-2acb5798. ### One attestation document per many patches The same process is repeated for all patches in the series. The resulting YAML document is then PGP-signed and sent to signatures@xxxxxxxxxx. The metadata of that message is intentionally kept to a minimum in order to minimize PII-sensitive data and reduce the potential for GDPR removal requests. Since the content of this document is extremely structured, we can develop a simple public-inbox filter to almost completely reduce the possibility of adding spam into the archive. We can also create a simple ingestion URL that can consume POST requests containing attestation data that will serve as an alternative to sending out the attestation email. ## Verification process To perform verification, the same three hashes are generated for each patch being attested, along with the i{8}-m{8}-p{8} attestation-id. We then use that attestation-id to query the signatures archive on lore.kernel.org: https://lore.kernel.org/signatures/?q=2a02abe0-215cf3f1-2acb5798 For each returned result, we perform PGP validation, and if it passes (meaning that we have the key in the local keyring and it is assigned sufficient trust), we then parse the message contents. A patch is considered as "passing attestation" if: - PGP validation is successful - We find an entry with the same three hashes - The email address in the "From:" field of the email being verified matches one of the UIDs on the key used to sign the attestation document This should be sufficient to provide strong assurance that the patch and all significant commit metadata are identical between the system where attestation was generated and the system where attestation was checked. # The attest-patches.py proof of concept The provided proof-of-concept script is able to both create and verify patch attestation data. It should handle most obvious malicious corner-cases that I was able to think of, but it hasn't been scrutinized, which is why it probably shouldn't be used for real work until more people have a chance to weigh in on both the script and the overall concept. ## The submitter workflow As I envision it, the submitter workflow would look as follows: - the developer runs "git-format-patch" - the developer reviews the changes and makes any last-minute edits they deem necessary before submitting their work to the list/maintainer - the developer executes "git-send-email" - the developer runs "attest-patches -a *.patches" - the developer sends attestation.eml to signatures@xxxxxxxxxx (or the tool auto-POSTs it to the submission URL, as mentioned) There can even be a fairly simple wrapper around git-send-email that would perform attestation as part of the "sending patches" stage. ## The reviewer workflow The reviewer does not need to concern themselves with attestation until they are ready to apply the patches to their git tree. When that moment comes: - the maintainer runs get-lore-mbox -aA (-A is not implemented yet) - get-lore-mbox performs attestation before generating the am-ready mbox - if attestation passes, get-lore-mbox adds two trailers to each patch: "Attestation-by:" and "Attestation-verified:". In our example case those are: Attestation-by: Kees Cook <kees@xxxxxxxxxx> (pgp:8972F4DFDC6DC026) Attestation-verified: Konstantin Ryabitsev <konstantin@xxxxxxxxxxxxxxxxxxx> - if attestation does not pass, get-lore-mbox can provide some basic explanation behind the failure before aborting: - attestation not found in the archive - the PGP key does not have sufficient trust - attestation exists but not all hashes pass, in which case the tool can show which hashes failed verification - etc If attestation is not found in the archive (e.g. the submitter didn't bother submitting it), the maintainer can request that it is generated and submitted post-fact (e.g. by rerunning "git-format-patch", or using their "Sent" folder, etc). For obvious and trivial patches, the maintainer may forego checking/requiring attestation entirely. That said, if a subsystem adopts attestation requirements, it should stick to requiring it on all submitted patches on the basis of principle. # Thoughts? Okay, what do you all think? I believe this scheme has the following merits: - it is opt-in and can be adopted by individual subsystem maintainers - it builds on top of the PGP trust framework already used extensively by the kernel developers - it doesn't litter mailing lists with non-human-readable attestation junk - it doesn't require that attestation data is created at the time when patches are submitted for review; the maintainer can request that it is provided at a later time when they are ready to apply the series to their git tree and want attestation data for the final sanity check and record-keeping - all attestations are recorded in the public-inbox "signatures" feed that can be mirrored along all other public-inbox repositories on lore.kernel.org Downsides: - we aren't solving the problem of delegated trust, which will continue to be the hardest part behind any distributed development effort I would greatly appreciate any feedback. Best, -K