Our cumbersome mailing list workflow (was: Re: [PATCH 0/6] repack_without_refs(): convert to string_list)

Michael Haggerty <mhagger@xxxxxxxxxxxx> · Tue, 25 Nov 2014 01:28:24 +0100

On 11/21/2014 07:00 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> I don't think that those iterations changed anything substantial that
>> overlaps with my version, but TBH it's such a pain in the ass working
>> with patches in email that I don't think I'll go to the effort of
>> checking for sure unless somebody shows interest in actually using my
>> version.
>>
>> Sorry for being grumpy today :-(
> 
> Is the above meant as a grumpy rant to be ignored, or as a
> discussion starter to improve the colaboration to allow people to
> work better together instead of stepping on each other's patches?

I think I know the sentiments of the mailing list regulars well enough
that it didn't seem worthwhile to open this topic again, so I was just
letting off steam without any hope of changing anything. But since you
asked...

Let me list the aspects of our mailing list workflow that I find
cumbersome as a contributor and reviewer:

* Submitting patches to the mailing list is an ordeal of configuring
format-patch and send-email and getting everything just right, using
instructions that depend on the local environment. We saw that hardly
any GSoC applicants were able to get it right on their first attempt.
Submitting a patch series should be as simple as "git push".

* Once patches are submitted, there is no assurance that you (Junio)
will apply them to your tree at the same point that the submitter
developed and tested them.

* The branch name that you choose for a patch series is not easily
derivable from the patches as they appeared in the mailing list. Trying
to figure out whether/where the patches exist in your tree is a largely
manual task. The reverse mapping, from in-tree commit to the email where
it was proposed, is even more difficult to infer.

* Your tree has no indication of which version of a patch series (v1,
v2, etc) is currently applied.

The previous three points combine to make it awkward to get patches into
my local repository to review or test. There are two alternatives, both
cumbersome and imprecise:

  * I do "git fetch gitster", then try to figure out whether the branch
I'm interested in is present, what its name is, and whether the version
in your tree is the latest version, then "git checkout xy/foobar".

  * Or I save the emails to a temporary directory (awkward because, Oh
Horror, I use Thunderbird and not mutt as email client), hope that I've
guessed the right place to apply them, run "git am", and later try to
remember to clean up the temporary directory.

* Once I've done that, the "supplemental" comments from the emails (the
cover letter and the text under the "---") are nowhere available in the
Git repository. So if I want to see the changes in context plus the
supplemental comments, I have to jump back and forth between email
client and Git repo. Plus I have to jump around the rest of the email
thread to see what comments other reviewers have already made about the
series.

* Following patch series across iterations is also awkward. To compare
two versions, I have to first get both patch series into my repo, which
involves digging through the ML history to find older versions, followed
by the "git am" steps. Often submitters are nice enough to put links to
previous versions of their patch series in their cover letters, but the
links are to a web-based email archive, from which it is even more
awkward to grab and apply patches. So in practice I then go back to my
email client and search my local archive for my copy of the same email
that was referenced in the archive, and apply the patch from there.
Finding comments about old versions of a patch series is nearly as much
work.

* Because of the indeterminate application point, accumulating
Signed-off-by lines, changed committer metadata, and maintainer tweaks,
the commits that make it to the official tree have different SHA-1s than
the commits in the submitter's tree, and both are different than the
commits in the tree of any reviewer who got the patches using "git am".
This makes it hard to be sure that everybody is on the same page. It
also makes it awkward for people to exchange ideas for further changes
via Git protocols in the form of patches.

* Because of the crude serialization of patches through email, it is
only possible to submit linear patch series, not merge commits.

Hmmm, I think that covers most of the problems of handling patches and
review via a mailing list.

What are some alternatives?

I did enjoy the variety of reviewing some patch series using Gerrit. It
is nice that it tracks the evolution of a patch from version to version,
and that the comments made on all versions of a patch are summarized in
a single place. This makes it easier to avoid commenting on issues that
other reviewers have already noted and easier to check that your own
comments have been addressed by later versions of the patch. On the
other hand, Gerrit seems strongly focused on individual patches rather
than on patch series (which might not match our workflow so well), the
UI is overwhelming (though I think one could get quite productive with
it if one used it every day), and the notification emails come in blizzards.

GitHub is another obvious alternative [1], free for open-source projects
albeit not open-source itself. It is very easy to use and easy to
interact with from a Git client, and also has a good API. It is super
easy to submit patches to a project using GitHub. But the GitHub user
interface (ISTM) is optimized for getting the net result of an entire
feature branch perfect, as opposed to iterating a series of patches
until each one is individually perfect (e.g., it works best when adding
patches on top of a feature branch as opposed to rebasing existing
patches). Since Git development is oriented towards perfecting each
patch, GitHub would be a bit of an impedance mismatch.

I don't think either of those systems is ideally matched to the Git
project's workflow, but in my opinion either one of them would be more
convenient for contributors and reviewers than serializing everything
through the mailing list.

Of course what is most convenient for the maintainer is of huge
importance, but I can't say much about that.

> FWIW, I liked your rationale for "many smaller steps".

Thanks.

> One small uncomfort in that approach is that it often is not very
> obvious by reading "log -p master.." alone how well the end result
> fits together.  Each individual step may make sense, or at least it
> may not make it any worse than the original, but until you apply the
> whole series and read "diff master..." in a sitting, it is somewhat
> hard to tell where you are going.  But this is not "risk" or "bad
> thing"; just something that may make readers feel uncomfortable---we
> are not losing anything by splitting a series into small logical
> chunks.

Ideally, the cover letter should provide the "big picture" rationale for
a patch series, and the individual commit messages should provide clues
about why that step is useful.

It might be a nice convention to ask people to write the "big picture"
rationale in their cover letter, separated by a "---" from non-permanent
discussion. Then the part above the "---" could be copied into the
commit message for the *merge commit* that brings the feature branch
into master. That would preserve it for posterity in a place where it is
relatively easy to find. But I am reluctant to make the process of
submitting patches even more difficult :-)

Michael

[1] Disclaimer: I work for GitHub.

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html