Re: [PATCH v2 4/5] mailmap: use case-sensitive comparisons for local-parts and names

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Tue, 12 Jan 2021 15:08:48 +0100

On Wed, Jan 06 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes:
>
>> On Sun, Jan 03 2021, brian m. carlson wrote:
>>
>> We just have to worry about cases where you're not all of these people
>> in one project's commit metadata and/or .mailmap, and thus mailmap rules
>> would match too generously:
>>
>>     "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx>
>>     "brian m. carlson" <SANDALS@xxxxxxxxxxxxxxxxxxxx>
>>     "BRIAN M. CARLSON" <sandals@xxxxxxxxxxxxxxxxxxxx>
>>     "BRIAN M. CARLSON" <SANDALS@xxxxxxxxxxxxxxxxxxxx>
>>
>> Is that really plausible? In any case, neither of these two patches make
>> reference to us already having changed this in the past in 1.6.2 and &
>> there being reports on the ML about the bug & us changing it back. See
>> https://lore.kernel.org/git/f182fb1700e8dea15459fd02ced2a6e5797bec99.1238458535u.git.johannes.schindelin@xxxxxx/
>>
>> Maybe we should still do this, but I think for a v3 it makes sense to
>> summarize that discussion etc.
>
> After reading the old discussion again, I am not sure if this is
> worth doing.  To many people, it is a promise we've made and kept
> that we treat addresses including the local part case-insensitively
> when summarising commits by ident strings.
>
> I'd really wish that this series were structured to have 5/5 early
> and 3&4/5 squashed into a single final patch.

Something that I only realized after I sent
<87czykvg19.fsf@xxxxxxxxxxxxxxxxxxx>: Any problems .mailmap has with
Turkish "dotless I" we have already with "git log --author=<name> -i".

Not to say that there isn't some problem to solve here, just that if we
do it's a more general issue than mailmap.

As can be inferred from my upthread reply I thought that was ASCII-only,
but it turns out we do set LC_CTYPE based on the user's locale, and it
also applies for English-speakers. E.g. in LANG=en_US.UTF-8
"--author=ævar -i" will work.

Of course that doesn't address the point of whether we should be
DWYM-ing E-Mail addresses, just the sub-claim that one reason we
shouldn't is because of impossible-to-solve Unicode edge cases.