Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar

"Kristoffer Haugsbakk" <code@xxxxxxxxxxxxxxx> · Fri, 15 Mar 2024 08:16:53 +0100

On Fri, Mar 15, 2024, at 06:59, Jeff King wrote:
> On Wed, Mar 13, 2024 at 07:23:25PM +0100, Kristoffer Haugsbakk wrote:
>
>> Thanks for your work on this. Now I can use dingbats as my comment char.
>
> Truly we have entered a golden age of technology. ;)

QoL features can in aggregate have a surprising impact :)

>
>> > @@ -523,7 +523,9 @@ core.commentChar::
>> >  	Commands such as `commit` and `tag` that let you edit
>> >  	messages consider a line that begins with this character
>> >  	commented, and removes them after the editor returns
>> > -	(default '#').
>> > +	(default '#'). Note that this option can take values larger than
>> > +	a byte (whether a single multi-byte character, or you
>> > +	could even go wild with a multi-character sequence).
>>
>> I don’t know if this expanded description focuses a bit much on the
>> history of the change[1] or if it is intentionally indirect about this
>> char-is-really-a-string behavior as a sort of easter egg.[2]
>
> Mostly I was worried that people would take "char" in the name to assume
> it could only be a single byte (I had originally even started the new
> sentence with "Despite the word 'char' in the name, this option
> can..."). And that is not just history, but a name we are stuck with
> forever[1].

Missing footnote or referring to my footnote?

My suggestion was to use a `core.commentString` alias. Which might
matter for new answers to questions about its use. It might not matter
if in practice most people get their config tips from 1500 point
StackOverflow question about how git-commit(1) keeps swallowing their
GitHub issue numbers (due to automatic linewrap) from 2011.

> Certainly "char" is an ambiguous term, though. I didn't mean to leave
> char-is-a-string as an easter egg; that's what I meant by
> "multi-character sequence". Certainly "string" is a shorter way of
> saying that. ;) But I wasn't sure its meaning would be obvious without
> the word "multi-character". Giving an example as you suggested does
> help that.
>
> That said...
>
>> Maybe it could be more directly stated like:
>>
>>   “ Note that this variable can in fact be a string like `foo`; it
>>     doesn’t have to be a single character.
>
> I actually do think the "string" nature is mostly uninteresting, and I'd
> be OK leaving it as an easter egg.

To my mind a string subsumes a char (multi- or not). Like in programming
languages: some might be used to single-char `#`, but I don’t think they
do a double take when they see languages with `//` or `--`.

> What your suggestion doesn't say is that multi-byte characters are
> OK. But if we think people will just assume that in a modern UTF-8
> world, then maybe we don't need to say anything at all?

Given that we’re mostly in the context of a commit message, an
ASCII-only restriction would feel archaic.

I guess it depends on what the *normal* is in the documentation at
large. As a user I’m used to Git handling the text that I give it.

> It actually does not have to be UTF-8.

Good point. Unicode is more appropriate.

> (Though to be clear, I think anybody using non-UTF8 in 2024 deserves
> our pity either for being crazy or for being stuck working on an
> antiquated system).

I honestly feel blessed that I have to worry so little about text
encoding.

-- 
Kristoffer Haugsbakk