Re: [PATCH] Documentation: clarify whitespace rules for trailers

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 24 Aug 2022 11:13:38 -0700

Christian Couder <christian.couder@xxxxxxxxx> writes:

> Commit e4319562bc (trailer: be stricter in parsing separators, 2016-11-02)
> restricted whitespaces allowed by `git interpret-trailers` in the "token"
> part of the trailers it reads. This commit didn't update the related
> documentation in Documentation/git-interpret-trailers.txt though.

OK.

> Also commit 60ef86a162 (trailer: support values folded to multiple lines,
> 2016-10-21) updated the documentation, but didn't make it clear how many
> whitespace characters are allowed at the beginning of new lines in folded
> values.
>
> Let's fix both of these issues by rewriting the paragraph describing
> what whitespaces are allowed by `git interpret-trailers` in the trailers
> it reads.
> ---
>  Documentation/git-interpret-trailers.txt | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

Missing sign-off.

> diff --git a/Documentation/git-interpret-trailers.txt b/Documentation/git-interpret-trailers.txt
> index 956a01d184..0e125d795b 100644
> --- a/Documentation/git-interpret-trailers.txt
> +++ b/Documentation/git-interpret-trailers.txt
> @@ -60,10 +60,12 @@ non-whitespace lines before a line that starts with '---' (followed by a
>  space or the end of the line). Such three minus signs start the patch
>  part of the message. See also `--no-divider` below.
>  
> -When reading trailers, there can be whitespaces after the
> -token, the separator and the value. There can also be whitespaces
> -inside the token and the value. The value may be split over multiple lines with
> -each subsequent line starting with whitespace, like the "folding" in RFC 822.
> +When reading trailers, there can be no whitespace inside the token,
> +and only one regular space or tab character between the token and the
> +separator.

That may have been the intent, but does it match the behaviour?

        static ssize_t find_separator(const char *line, const char *separators)
        {
                int whitespace_found = 0;
                const char *c;
                for (c = line; *c; c++) {
                        if (strchr(separators, *c))
                                return c - line;
                        if (!whitespace_found && (isalnum(*c) || *c == '-'))
                                continue;
                        if (c != line && (*c == ' ' || *c == '\t')) {
                                whitespace_found = 1;
                                continue;
                        }
                        break;
                }
                return -1;
        }

When parsing "X  :", first we encounter 'X', we haven't seen
whitespace, 'X' passes isalnum(), and we continue.  Then we
encounter ' ', we haven't seen whitespace but it is neither isalnum
or dash, so we go on without hitting the first continue.  We are not
at the beginning of the line, we are seeing a space, so we remember
the fact that we saw whitespace and continue.  Next we see another ' ',
we do not hit the first continue, we are not at the beginning of the
line, and we are looking at ' ', so we again continue.  Finally, we see
':' that is a separator and we return happily.

The code seems to be allowing zero or more space/tab before the
separator.

Stepping back and reading the original again, I think the original
was almost correct.  There can be whitespaces after the token.
There can be whitespaces after the separator.  There can be
whitespaces after the value.  The only thing it got wrong was that
it pretended to allow whitespaces inside the token, while in reality
we allow whitespaces inside the value but not inside the token.

So, a minimum fix would be to s/token and the value/value/; I do not
mind a more extensive rewriting if it improves clarity and correctness,
but "only one between the token and the separator" is not quite correct.
Besides, that phrasing gives a false impression that the whitespace is
mandatory, but you wanted to express "zero or one" optionality, no?

> There can be whitespaces before, inside or after the
> +value. The value may be split over multiple lines with each subsequent
> +line starting with at least one whitespace, like the "folding" in RFC
> +822.
>  
>  Note that 'trailers' do not follow and are not intended to follow many
>  rules for RFC 822 headers. For example they do not follow