Re: [PATCH] mailinfo: support Unicode scissors

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 01 Apr 2019 18:07:22 +0900

SZEDER Gábor <szeder.dev@xxxxxxxxx> writes:

> On Mon, Apr 01, 2019 at 12:01:04AM +0200, Andrei Rybak wrote:
>> diff --git a/mailinfo.c b/mailinfo.c
>> index b395adbdf2..4ef6cdee85 100644
>> --- a/mailinfo.c
>> +++ b/mailinfo.c
>> @@ -701,6 +701,13 @@ static int is_scissors_line(const char *line)
>>  			c++;
>>  			continue;
>>  		}
>> +		if (!memcmp(c, "✂", 3)) {
>
> This character is tiny.  Please add a comment that it's supposed to be
> a Unicode scissors character.
>
> Should we worry about this memcmp() potentially reading past the end
> of the string when 'c' points to the last character?

Quite honestly, I'd rather document what "scissors" line looks like
exactly and make sure no readers would mistake that we'd accept any
Unicode character whose name has substring "scissors" in it.

Ah, wait, we already do.  It is very clear that scissors are either
">8" (for right handers) or "8<" (for lefties) and nothing else.

Unless you are sure that you are (and more importantly, can stay to
be) exhaustive, adding allowed representations for a thing will
force users to learn more non-essential things ("we allow only 8<
and >8" vs "we allow only these 7, even though we are aware that
there are at least 14 more that we do not allow"---the end-user
needs to remember which 7 are allowed) and does not help users.

Taking only "black scissors" U+2702 but not all of U+2700 - U+2704
will be a cause for unnecessary end-user complaints "why do you take
this but not that one?"  Then the next noise would be "why is '-'
the only perforation and not U+2014 Em Dash or U+2013 En Dash?"

Let's try not to be cute in non-essential things like how a pair of
scissors ought to be spelled.  If "8<" had worked well for us for
the past 10 years, we should just stick to it.