Re: [PATCH v3 2/3] CodingGuidelines: hint why we value clearly written log messages

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Thu, 14 Apr 2022 16:04:59 +0200

On Wed, Apr 13 2022, Junio C Hamano wrote:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>> Emily Shaffer <emilyshaffer@xxxxxxxxxx> writes:
>>
>>>> + - Log messages to explain your changes are as important as the
>>>> +   changes themselves.  Clearly written code and in-code comments
>>>> +   explain how the code works and what is assumed from the surrounding
>>>> +   context.  The log messages explain what the changes wanted to
>>>> +   achieve and why the changes were necessary (more on this in the
>>>> +   accompanying SubmittingPatches document).
>>>> +
>>>
>>> One thing not listed here, that I often hope to find from the commit
>>> message (and don't), is "why we did it this way instead of <other way>".
>>> I am not sure how to phrase it in this document, though. Maybe:
>>>
>>>   The log messages explain what the changes wanted to achieve, any
>>>   decisions that were made between alternative approaches, and why the
>>>   changes were necessary (more on this in blah blah)
>>>
>>> Or maybe "...whether any alternative approaches were considered..." fits
>>> the form of the surrounding sentence better.
>>
>> Quite valid observation.
>>
>> Documentation/SubmittingPatches::meaningful-message makes a note on
>> these points, and the above may want to be more aligned to them.
>>
>> Patches welcome, as these have long been merged to 'master/main'.
>
> Another thing.  If you (not Emily, but figuratively) haven't watched
> Victoria's talk https://www.youtube.com/watch?v=4qLtKx9S9a8 on the
> topic of clearly written commits, you should drop everything you are
> doing and go watch it.
>
> And with what we learn from it, we may be able to rewrite this part
> of the documentation much more clearly.

The slides for it are at
https://vdye.github..io/2022/OS101-Writing-Commits.pdf (not in the video
description, but at the very end of the video).

It's easy to nitpick/improve existing examples, so here goes :)

The main commit message example in that talk starts as just "Make error
text more helpful", and ends with a better version as:

	git-portable.sh: make error text more helpful

	The message “Not a valid command: <invalid command>” is
	intended to notify the user that their subcommand is invalid.
	However, when no subcommand is given, the "empty" subcommand
	results in the same message: "Not a valid command:". This does
	not clearly guide the user to the correct behavior, so print
	"Please specify a command" when no subcommand is specified.

For our CodingGuidelines I think it would be useful to have some version
of "if you can explain something with prose or tests, prefer
tests".

I.e. other things being equal I'd much prefer this version
(pseudo-patch):

	git-portable.sh: don't conflate invalid and non-existing command

	 git-portable-test.sh | 2 +-
	 1 file changed, 1 insertion(+), 1 deletion(-)

	diff --git a/git-portable-test.sh b/git-portable-test.sh
	index c8bd464..e03f4a8 100644
	--- a/git-portable-test.sh
	+++ b/git-portable-test.sh
	@@ -5,7 +5,7 @@ test_expect_failure 'usage: invalid command' '
	 '

	 test_expect_failure 'usage: no command' '
	-	test_expect_code_output 129 "Not a valid command: " ./gitportable.sh
	+	test_expect_code_output 129 "Please specify a command" ./gitportable.sh
	 '

	 test_done

It ends up basically saying the same thing, but now we're saying it with
a regression test (test_expect_code_output doesn't exist, but let's
pretend it's test_expect_code + a test_cmp-alike).

What it does entirely omit is the "why".

Now I realize I'm nitpicking a slide shown at a conference, which by its
nature needs to show a small pseudo-example, but I think this applies in
general:

While "why" is a good rule of thumb I think it's just as important to
know when not to include explanations and when to include one.

For cases where something is straightforward enough (as in this case,
the RHS of ": " is clearly missing) I'd think omitting the explanation
would be better, as we should also be concerned about the overall signal
ratio.

(Now, if anyone glances at my own commit messages they'll see I'm
thoroughly in "throwing rocks from a glass house" territory here :) I'm
not saying I'm consistency practicing what I'm preaching).

But just like comments there's no right answer, when one person thinks
an explanation is different from another.

But it is unambiguously the case that we can often replace prose with
tests, and in those cases we should almost always prefer that.

It's also the case that even if everyone agrees that a "why" is needed
there's multiple ways to store that information. One is via commit
messages, another would e.g. be that same commit updating some shared
guidelines about goals/examples of CLI usage.

So in this case, if a Documentation/CodingGuidelines had clear examples
of preferred usage, we could just point briefly point to that as
rationale.

While git's commit messages are excellent, I think that's one area where
we really need improvement. It's rare to dig into some old code where no
rationale can be found for it, either in the commit itself, or in the
preceding ML discussion.

But it's unfortunately (at least in my experience) more often than not
the case that you really do need to consult those commit messages or ML
archives, even for things that have come up a *lot* of times, they were
just never documented in-tree.

There's all sorts of reasons for that which are not the result of any
person doing anything wrong, but I do think it's something we could and
should focus more on as a project.

The barriers of entry for adding documentation or adjusting existing
documentation are much higher than adding a one-off explanation in a
commit message.

Partially (and probably mostly) that's a really good thing, but I can't
help but wonder if we're getting that balance right given the (in my
subjective experience) end result of us often lacking good docs, while
we're not lacking if one searches for replacements for those docs in
commit messages or the ML archive.

One more thing that I think is not explicitly covered (I skimmed the
slides, but haven't gone throug the full back yet): Minimizing diffs.

E.g. the talk shows 287fd17e3a1 (sparse-index: prevent repo root from
becoming sparse, 2022-03-01) as an example, which has this hunk:

	diff --git a/dir.c b/dir.c
	index d91295f2bcd..a136377eb49 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -1463,10 +1463,11 @@ static int path_in_sparse_checkout_1(const char *path,
	 	const char *end, *slash;

	 	/*
	-	 * We default to accepting a path if there are no patterns or
	-	 * they are of the wrong type.
	+	 * We default to accepting a path if the path is empty, there are no
	+	 * patterns, or the patterns are of the wrong type.
	 	 */
	-	if (init_sparse_checkout_patterns(istate) ||
	+	if (!*path ||
	+	    init_sparse_checkout_patterns(istate) ||
	 	    (require_cone_mode &&
	 	     !istate->sparse_checkout_patterns->use_cone_patterns))
	 		return 1;

I think this is a worthwhile thing to consider as a replacement:

	diff --git a/dir.c b/dir.c
	index d91295f2bcd..93a2320ae57 100644
	--- a/dir.c
	+++ b/dir.c
	@@ -1466,7 +1466,8 @@ static int path_in_sparse_checkout_1(const char *path,
	 	 * We default to accepting a path if there are no patterns or
	 	 * they are of the wrong type.
	 	 */
	-	if (init_sparse_checkout_patterns(istate) ||
	+	if (!*path || /* we consider an empty pattern to be no pattern */
	+	    init_sparse_checkout_patterns(istate) ||
	 	    (require_cone_mode &&
	 	     !istate->sparse_checkout_patterns->use_cone_patterns))
	 		return 1;

I.e. trying to optimize for smaller diffs whenever possible. It this
case the word-diff for the original is:

        /*
         * We default to accepting a path if {+the path is empty,+} there are no
         {+*+} patterns{+,+} or [-* they-]{+the patterns+} are of the wrong type.
         */

Now, obviously another small isolated example that's not worth
nitpicking in itself, but just serves to make a larger point. It's clear
why the rephrasing was done in that case, because the patch adds the
"!*path" check, so it makes sense a-priory to have the comment reflect
that.

But one thing where advice about "narrative structure" and good prose
tends to break down when it comes to software development is that we're
much more focused on reviews of incremental additions than many other
fields, where it tends to be more about the final product.