Re: [PATCH 1/3] config.txt: describe whitespace characters further and more accurately

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dragan Simic <dsimic@xxxxxxxxxxx> writes:

> Make it more clear what the whitespace characters are in the context of git
> configuration, improve the description of the trailing whitespace handling,
> and correct the description of the value-internal whitespace handling.
>
> Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>
> ---
>  Documentation/config.txt | 19 +++++++++++--------
>  1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 782c2bab906c..4480bb44203b 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -22,9 +22,10 @@ multivalued.
>  Syntax
>  ~~~~~~
>  
> -The syntax is fairly flexible and permissive; whitespaces are mostly
> -ignored.  The '#' and ';' characters begin comments to the end of line,
> -blank lines are ignored.
> +The syntax is fairly flexible and permissive.  Whitespace characters,
> +which in this context are the space character (SP) and the horizontal
> +tabulation (HT), are mostly ignored.  The '#' and ';' characters begin
> +comments to the end of line.  Blank lines are ignored.

OK, except for "whitespace characters"---do we need to say
"whitespace characters", after we already listed HT and SP are the
ones, instead of just "whitespaces"?

> @@ -64,12 +65,14 @@ The variable names are case-insensitive, allow only alphanumeric characters
>  and `-`, and must start with an alphabetic character.
>  
>  A line that defines a value can be continued to the next line by
> +ending it with a `\`; the backslash and the end-of-line are stripped.
> +Leading whitespace characters after 'name =', the remainder of the
>  line after the first comment character '#' or ';', and trailing
> +whitespace characters of the line are discarded unless they are enclosed
> +in double quotes.  This discarding of the trailing whitespace characters
> +also applies after the remainder of the line after the comment character
> +is discarded.

"also" makes it sound as if we do it twice, once to remove trailing
whitespaces after the remainder of the line after '#", and then trim
the trailing whitespaces after we removed the comment.

I wonder if we can make it clearer by following the step-by-step
nature of the earlier part of the paragraph through.  We already say
the folded line processing is done first, so break things down in
conceptual phases/steps, perhaps like

 * The backslash at the end-of-line is removed, together with the
   end-of-line, to form a single long line.

 * Anything that come after the first unquoted comment character,
   either '#' or ';', are discarded.

 * The leading and trailing whitespaces around the value part
   (i.e. what follows 'name =') are discarded.

 * Remaining unquoted whitespaces inside the value part are munged.

> +Any number of internal whitespace characters found within
> +the value are converted to the same number of space (SP) characters.

The last one sounds like a bug to me, by the way.

At least the very original 17712991 (Add ".git/config" file parser,
2005-10-10) squashed a run of whitespace characters into a single
SP, which makes sense as a "clean-up".

But ebdaae37 (config: Keep inner whitespace verbatim, 2009-07-30),
while claiming to "Keep" inner whitespaces, broke it by replacing
any isspace() bytes that are not SP with SP, contradicting its
stated purpose.

As the latest change by the author of that change is from more than
10 years ago, I do not expect that he is still interested in this
part of the codebase, but thanks to a very clearly written log
message, we can read what the motivation behind that change was, and
seeing that what the code does contradicts with the stated
motivation we can safely declare that this is an ancient bug.

Fixing that bug can of course be left outside the series.  For those
who are looking for microproject ideas who discovered this message
by searching for the #leftoverbits keyword, one possible fix would
be to revert ebdaae37, make sure a value with any whitespace in it
gets quoted, and document clearly that an unquoted run of
whitespaces is squashed into a single SP.  Another way that is
milder is to finish what ebdaae37 wanted to do and retain the
whitespaces "verbatim".

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux