On Thu, Nov 18, 2021 at 02:04:08AM -0800, Carlo Arenas wrote: > > I haven't tested it, but I think that for this to work with the log, > we also need to make sure that all log entries that might not be UTF-8 > get first iconv() which is why probably Æevar mentioned[1] > i18n.commitEncoding in his old email. Having tested it, it seems this might work, but needs at least the following to make the test for it meaningful: diff --git a/t/t7812-grep-icase-non-ascii.sh b/t/t7812-grep-icase-non-ascii.sh index 1da6b07a57..e1fbdc0f80 100755 --- a/t/t7812-grep-icase-non-ascii.sh +++ b/t/t7812-grep-icase-non-ascii.sh @@ -47,6 +47,8 @@ test_expect_success REGEX_LOCALE 'grep string with regex, with -F' ' ' test_expect_success REGEX_LOCALE 'pickaxe -i on non-ascii' ' + GIT_COMMITTER_NAME="C O Mitter" && + GIT_COMMITTER_EMAIL="committer@xxxxxxxxxxx" && git commit -m first && git log --format=%f -i -S"TILRAUN: HALLÓ HEIMUR!" >actual && echo first >expected && @@ -72,10 +74,10 @@ test_expect_success GETTEXT_LOCALE,PCRE 'log --committer with an ascii pattern o EOF test_write_lines "fifth" >file5 && git add file5 && - GIT_COMMITTER_NAME="Ç O Mîtter" && + GIT_COMMITTER_NAME=$(echo "Ç O Mîtter" | iconv -t ISO-8859-1) && GIT_COMMITTER_EMAIL="committer@xxxxxxxxxxx" && git -c i18n.commitEncoding=latin1 commit -m thïrd && - git -c i18n.logOutputEncoding=latin1 log -1 --pretty=fuller --color=always --perl-regexp --committer=" O.*" >log && + git log -1 --pretty=fuller --color=always --perl-regexp --committer=" O.*" >log && grep Commit: log >actual.raw && test_decode_color <actual.raw >actual && test_cmp expected actual The first hunk is a nice to have and mimics what is done in the previous test where a non UTF match will match instead an equivalent ASCII text, but is not strictly needed unless the expression is also tightened. Note the second hunk is incomplete as the message is still not really encoded as latin1 and will need an equivalent processing, but left it out so it can be done together with fixing the corresponding test. I have to admit that adding to a complex condition might be obscuring some other edge case, and indeed the fact the test passed even without this fix is concerning. >From my reading of what Ævar suggested originally[1], it would seem that the new is_log condition should be on an branch of its own, and more code should be needed to make sure the contents we are going to use are indeed utf8 by the time it hits pcre2_*match() before setting it. Carlo [1] https://lore.kernel.org/git/211116.86tugcp8mg.gmgdl@xxxxxxxxxxxxxxxxxxx/