Re: [PATCH v2] pretty-options.txt: describe supported encoding

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 Aug 2021 10:03:56 -0700

Krzysztof Żelechowski <giecrilj@xxxxxxxxxxxx> writes:

> git log recognises only system encodings supported by iconv(1), but not 
> POSIX character maps used by iconv(1p). Document it.
>
> Signed-off-by:  <ne01026@xxxxxxxxxxx>

The "Human Readable Name <email@xxxxxxxxx>" on this line must match
the one on the "From: " line that records the author of the patch.

If you are forwarding somebody else's patch (with or without
improvement), we also need your sign off.

> diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-
> options.txt
> index 27ddaf84a19..4f8376d681b 100644
> --- a/Documentation/pretty-options.txt
> +++ b/Documentation/pretty-options.txt
> @@ -36,9 +36,13 @@ people using 80-column terminals.
>         The commit objects record the encoding used for the log message
>         in their encoding header; this option can be used to tell the
>         command to re-code the commit log message in the encoding
> -       preferred by the user.  For non plumbing commands this
> -       defaults to UTF-8. Note that if an object claims to be encoded
> -       in `X` and we are outputting in `X`, we will output the object
> +       preferred by the user.

> +       The encoding must be a system encoding supported by iconv(1),
> +       otherwise this option will be ignored.
> +       POSIX character maps used by iconv(1p) are not supported.

This paragraph is a bit hard to grok.

I think it is saying that the "-f frommap -t tomap" form in [*1*]
that can use arbitrary character set description file is not
supported, but "-f fromcode -t tocode" form, which also is what
iconv_open() takes [*2*], is supported.  Am I reading it correctly?

Is there an easier-to-read way to explain the distinction to our
average reader?

What I am getting at is this.  Imagine average users who need to see
their commits recoded to iso-8859-2.  They see "git log" has
"--encoding=<encoding>" option, read the above paragraph and wonder
if they are on the supported side or unsupported side of the above
paragraph.  I want to make it easy for them to stop wondering.

For that purpose, "iconv(1) vs iconv(1p)" would not help them very
much, especially considering that not all Git users are UNIX users
(they probably do not even know what (1) and (1p) means).

> +       For non-plumbing commands this defaults to UTF-8.

I think I can guess why the patch wants to change "non plumbing" to
"non-plumbing" (I do not strongly care either way, so I'd take the
patch without complaint about that particular change).  It would
have been nicer to mention this change in the proposed commit log
message, though, but that is minor.

> +       Note that if an object claims to be encoded in `X`
> +       and we are outputting in `X`, we shall output the object
>         verbatim; this means that invalid sequences in the original
>         commit may be copied to the output.

I probably wouldn't have noticed this if a new manual page used
"shall" consistently, but since the original deliberately used
"will" and the patch changes it to "shall", I have to ask: why?

I think our end-user facing manual pages tend to avoid the latter.
We do use "shall" in the RFC2119/BCP14 sense on the technical side
of our documentation where we give requirements to the third-party
implementations so that they can interoperate with us, but this is
not such a description.

Thanks.

[References]

*1* https://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html 
*2* https://pubs.opengroup.org/onlinepubs/9699919799/functions/iconv_open.html