Re: [PATCH v19 03/20] checkout: add '\n' to reflog message

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 30 Jun 2020 16:58:37 -0700

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> writes:

> Most places that write a reflog message use a message that ends with '\n'.
>
> Some places (the one mentioned here) do not append a '\n'.

Just to make sure I am following, I see

	if (update_ref("initial pull", "HEAD", merge_head, ...

in bultin/pull.c; this is supposed to be written with "\n" under the
new world order?

As reflog messages are supposed to be a single liner (due to the
nature of the original storage format), I have a feeling that it is
cumbersome to the programmers to requre "\n" at the end.  It would
not add much value to the readability of the resulting code, either.

If the places that have hardcoded messages like the above have to be
written like so:

	if (update_ref("initial pull\n", "HEAD", merge_head, ...

it makes the code worse.

Let's see if I can come up with a coherent "rule" that we can follow
to improve the current situation that is an undisciplined mixture as
you found out.

    ... goes and looks the current code ...

So, what happens is that update->msg in each ref transaction (in
refs.c::ref_transaction_add_update()) gets verbatim copy of what the
caller gives the refs API, but when it is written out as an reflog
entry, the message is cleansed by calling refs.c::copy_reflog_msg()
that squashes a run of whitespaces (not just SP but including LF and
HT) into a single SP and then trims whitespaces at the end.  

Then to the result, a LF is unconditionally added.

I would consider the massaging of the application-supplied message
done in copy_reflog_msg() as normalization, which turns possibly
invalid data (like a message that has LF anywhere in it) into an
acceptable form.  Which defines the kosher form of a reflog message
as not having any LF (or excess SP) in it, and the terminating LF is
not part of the message.  From that point of view, if we were to do
something, I would rather standardize on not having the final (and
only) LF.

But some callers that prepare a message in a string buffer may find
it easier not having to strip the trailing LF, or even depend on the
fact that they can feed a string split into multiple lines and the
result becomes a single line.  So I do not see a point in "fixing"
the existing callers that gives an extra LF at the end or other
not-so-kosher form of data---these will be normalized by calling
copy_reflog_msg() anyway.  

If they are easy to fix, like sending a constant string with LF at
the end, we would want to fix them to improve the readability of the
resulting code, though.  If the update_ref() call in builtin/pull.c
were sending "initial pull\n" as its first parameter, it would be
worth fixing by removing the LF, as the result would be easier to
read and it is not too much work.  On the other hand, if the message
that eventually becomes a reflog entry comes from outside and if the
existing code relies on the sanitizing done by copy_reflog_msg(), I
do not see a point in carefully removing the LF at the end on the
caller's side, either.

> I initially thought we could or should fix this across git-core, but I
> think it will be a lot of work to find all occurrences.

I think we are on the same page, even if the definition of which is
correct might be different between us.  As long as all the refs
backend does the same sanitizing (which is probably easy---just call
copy_reflog_msg() from them, or perhaps we may want to move the call
to where update->msg are queued for each step in a transaction,
which would relieve individual backends from having to worry about
this issue at all), they would all behave the same for the same
reflog data.

Thanks.