Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 08 Mar 2024 07:58:54 -0800

Phillip Wood <phillip.wood123@xxxxxxxxx> writes:

> I agree with your analysis. I do wonder though if we should reject
> whitespace and control characters when parsing core.commentChar, it
> feels like accepting them is a bug waiting to happen. If
> comment_line_char starts with ' ' or '\t' that part will be eaten by
> the strspn() above and so starts_with_mem() wont match. Also we will
> never match a comment if comment_line_str contains '\n'.

Another thing I was wondering is what we want to do a random
byte-sequence that may match from the middle of a multi-byte UTF-8
character.

The reason I haven't mentioned these "nonsense input" is because
they will at worst only lead to self-denial-of-service to those who
are too curious, and will fall into "don't do it then" category.

Also, what exactly is the definition of "nonsense" will become can
of worms.  I can sympathise if somebody wants to use "#\t" to give
themselves a bit more room than usual on the left for visibility,
for example, so there might be a case to want whitespace characters.

>> Not sure why lines that start with CR are considered comment lines,
>> though.
>
> I think it is a lazy way of looking for an empty line ending in CR LF,
> it should really be
>
> 	|| (bol[0] == '\r' && bol[1] == '\n') ||

My recollection matches your speculation. 

IIRC the lazy persono was probably me but I didn't run "git blame".