Hi Josh, as discussed at the GitMerge, I am trying to come up with tooling that will allow for substantially less tedious navigation between the local repository, the mailing list, and what ends up in the `pu` branch. That tooling would *still* not help lowering the barrier of entry for contributing to Git by a lot, as it would *still* not address the problem that mails sent from the most prevalent desktop mail client, as well as mails sent from the most prevalent web mail client, are simply and unceremoniously dropped. (This problem was acknowledged by quite a few nods even at the Contributors' Summit...) But still, we decided to start *somewhere* and this tooling is what we agreed on. It is quite a bit harder going than I would like: as we have figured out, the Subject: line is not a good way to link the commits with the original mails containing the patches, as commit messages are modified before being pushed often enough to make this a fragile matching. So I thought maybe the From: line (from the body, if available, otherwise from the header) in conjunction with the "Date:" header would work. But a preliminary study shows that there are 336 From: + Date: combinations in the Git mailing list archive that are not unique. 71 of these are shared by three or more mails, even, and 9 are shared by more than 10 mails, respectively. This is bad! Unsurprisingly, the top 10 of these cases were obviously caused by the builtin `git am` bug where it would not reset the author date properly. Surprisingly, though, there were a few cases from 2005, too. I had a quick look to find out what was the culprit (looking at the 17-strong patch series "Documentation fixes in response to my previous listing" by Nikolai Weibull, but I am at a loss there: the mail claims to be sent by git-send-email and the patches appear to be generated by git-format-patch as of v0.99.9l, neither of which had a Date:-related bug back in that time frame. My best guess is that the patches were mishandled by a tool similar to rebase -i (which entered Git only at v1.5.3). For details, see: http://public-inbox.org/git/11340844841342-git-send-email-mailing-lists.git@xxxxxxxxxxxxxxxxxxxxxx/ (this is also an example where public-inbox' thread detection went utterly wrong, including way too many mails in the "thread") There was even a case of duplicated Date: headers in 2012. Now, this case is very curious, as there have been 7 mails with identical Date: header, but it was not a 6-strong patch series. Instead, it was a 4-strong patch series that needed three iterations before it was accepted, and the identical Date: header appears only in v2's patches (*not* in its cover letter) and it *disappeared* in v3's 4/4, where it was set *back* by a week (to the Date: it had in v1). For details, see http://public-inbox.org/git/cover.1354693001.git.Sebastian.Leske@xxxxxxxxxxx/ and http://public-inbox.org/git/cover.1354324110.git.Sebastian.Leske@xxxxxxxxxxx/ and http://public-inbox.org/git/b115a546fa783b4121d118bb8fdb9270443f90fa.1353691892.git.Sebastian.Leske@xxxxxxxxxxx/ This last example also demonstrates a very curious test case for a different difficulty in trying to reconstruct lost correspondences: the patch series was applied *twice*, independently of each other. First, on the day v3 was submitted, it was applied on top of v1.8.1-rc0 (as commits ee26a6e2b8..dd465ce66f), although it was not merged until v1.8.1-rc3. 22 days later, it was reapplied on top of maint so it could enter v1.8.0.3 (back then, Git still had "patchlevel" versions): c2999adcd5..008c208c2c. As you can see, there is a many-to-many relationship here, even if you do leave the *original* branch out of the picture entirely. Will keep you posted, Dscho P.S.: I used public-inbox.org links instead of commit references to the Git repository containing the mailing list archive, because the format of said Git repository is so unfavorable that it was determined very quickly in a discussion between Patrick Reynolds (GitHub) and myself that it would put totally undue burden on GitHub to mirror it there (compare also Carlos Nieto's talk at GitMerge titled "Top Ten Worst Repositories to host on GitHub").