Currently, Git always looks up entries in the mailmap in a case-insensitive way, both for names and addresses, which is, as explained below, suboptimal. First, for email addresses, RFC 5321 is clear that only domains are case insensitive; local-parts (the portion before the at sign) are not. It states this: The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. There exist systems today where local-parts remain case sensitive (and this author has one), and as such, it's incorrect for us to case fold them in any way. Let's add a failing test that indicates this is a problem, while still keeping the test for case-insensitive domains. Note that it's also incorrect for us to case-fold names because we don't guarantee that we're using the locale of the author, and it's impossible to case-fold names in a locale-insensitive way. Turkish and Azeri contain both a dotted and dotless I, and the uppercase ASCII I folds not to the lowercase ASCII I, but to a dotless version, and vice versa with the lowercase I. There are many words in Turkish which differ only in the dottedness of the I, so it is likely that there are also personal names which differ in the same way. That would be a problem even if our implementation were perfect, which it is not. We currently fold only ASCII characters, so this feature has never worked correctly for the vast majority of the users on the planet, regardless of the locale. For example, on Linux, even in a Spanish locale, we don't handle "Simón" properly. Even if we did handle that, we'd probably also want to implement Unicode normalization, which we don't. In general, case-folding text is extremely language- and locale-specific and requires intimacy with the spelling and grammar of the language in question and careful attention to the Unicode details in order to produce a result that is meaningful to humans and conforms with linguistic and societal norms. Because we do not have any of the required context with a plain personal name, we cannot hope to possibly case-fold personal names correctly. We should stop trying to do so and just treat them as a series of bytes, so let's add a test that we don't case-fold personal names as well. Signed-off-by: brian m. carlson <sandals@xxxxxxxxxxxxxxxxxxxx> --- t/t4203-mailmap.sh | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 586c3a86b1..32e849504c 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -170,10 +170,35 @@ Repo Guy (1): EOF -test_expect_success 'name entry after email entry, case-insensitive' ' +test_expect_success 'name entry after email entry, case-insensitive domain' ' mkdir -p internal_mailmap && echo "<bugs@xxxxxxxxxx> <bugs@xxxxxxxxxx>" >internal_mailmap/.mailmap && - echo "Internal Guy <BUGS@xxxxxxxxxx>" >>internal_mailmap/.mailmap && + echo "Internal Guy <bugs@xxxxxxxxxx>" >>internal_mailmap/.mailmap && + git shortlog HEAD >actual && + test_cmp expect actual +' + +cat >expect <<\EOF +Repo Guy (1): + initial + +nick1 (1): + second + +EOF + +test_expect_failure 'name entry after email entry, case-sensitive local-part' ' + mkdir -p internal_mailmap && + echo "<bugs@xxxxxxxxxx> <bugs@xxxxxxxxxx>" >internal_mailmap/.mailmap && + echo "Internal Guy <BUGS@xxxxxxxxxx>" >>internal_mailmap/.mailmap && + git shortlog HEAD >actual && + test_cmp expect actual +' + +test_expect_failure 'name entry after email entry, case-sensitive personal name' ' + mkdir -p internal_mailmap && + echo "<bugs@xxxxxxxxxx> <bugs@xxxxxxxxxx>" >internal_mailmap/.mailmap && + echo "Nick1 <bugs@xxxxxxxxxx> NICK1 <bugs@xxxxxxxxxx>" >internal_mailmap/.mailmap && git shortlog HEAD >actual && test_cmp expect actual '