Jeff King <peff@xxxxxxxx> writes: > 1. I suppose we could also use $LANG or one of the $LC_* variables to > guess at the encoding of the user's pattern. But I think using the > output encoding makes the most sense, since then the pattern you > searched for will actually be in the output. I agree. In addition, if we were to do anything with LANG/LC_CTYPE, it should be done at the layer that implements log-output-encoding (e.g. lack of configured encoding with nonstandard LANG/LC_CTYPE would use the locale, or something), I think. > 2. There are still problems with utf8 normalization. E.g., my tests > represent utf-8 é with \xc3\xa9 (the code point for that glyph), > but it could also be represented by \x65\xcc\x81 (e + combining > acute). But that is not a new problem; it is an inherent issue with > grepping utf8. We might in the future want to offer an option to > normalize utf8 (or possibly the regex library can be taught to > handle this). True; in either case, this caller (or any other callers) should care. Only grep_buffer() (actually, grep_source_1()) needs to be taught about it. > 4. I'm still not clear on why "--graph --no-walk" wants to look at > commit_match after we have already cleared the commit buffer. I > agree it's nonsensical, but I wonder if it might be a symptom of an > underlying bug or inefficiency. Yeah, that may be something we may want to check, I agree. The aded test is also nice. Thanks. > diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh > new file mode 100755 > index 0000000..52a7472 > --- /dev/null > +++ b/t/t4210-log-i18n.sh > @@ -0,0 +1,58 @@ > +#!/bin/sh > + > +test_description='test log with i18n features' > +. ./test-lib.sh > + > +# two forms of é > +utf8_e=$(printf '\303\251') > +latin1_e=$(printf '\351') > + > +test_expect_success 'create commits in different encodings' ' > + test_tick && > + cat >msg <<-EOF && > + utf8 > + > + t${utf8_e}st > + EOF > + git add msg && > + git -c i18n.commitencoding=utf8 commit -F msg && > + cat >msg <<-EOF && > + latin1 > + > + t${latin1_e}st > + EOF > + git add msg && > + git -c i18n.commitencoding=ISO-8859-1 commit -F msg > +' > + > +test_expect_success 'log --grep searches in log output encoding (utf8)' ' > + cat >expect <<-\EOF && > + latin1 > + utf8 > + EOF > + git log --encoding=utf8 --format=%s --grep=$utf8_e >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'log --grep searches in log output encoding (latin1)' ' > + cat >expect <<-\EOF && > + latin1 > + utf8 > + EOF > + git log --encoding=ISO-8859-1 --format=%s --grep=$latin1_e >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'log --grep does not find non-reencoded values (utf8)' ' > + >expect && > + git log --encoding=utf8 --format=%s --grep=$latin1_e >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'log --grep does not find non-reencoded values (latin1)' ' > + >expect && > + git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e >actual && > + test_cmp expect actual > +' > + > +test_done -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html