Nguyen, thanks for the help and the patch. Also the escaping suggested by Scharfe seems as good choice. But i dig some more into the problem and I found some other thing. That's why I replied on the main thread not on the patch. I hope you'll excuse me if this is a bad practice. git grep -i -P also does not works because the PCRE_UTF8 is not set and pcre library does not treat the string as UTF-8. pickaxe search also uses kwsearch so the case insensitive search with it does not work (e.g. git log -i -S). Maybe this is a less of a problem here as one is expected to search for exact string (hence knows the case) There is a interesting corner case. is_fixed treats all patterns containing nulls as fixed. So what about if the string contains non-ASCII symbols as well as nulls and the search is case insensitive :) I have to admin that my knowledge in UTF-8 is not enough to answer the question if this could occur during normal usage. For example the second byte in multi-byte symbol is NULL. I would guess that's not true as it would break a lot of programs that depend on NULL delimited string but it's good if somebody could confirm. GNU grep indeed uses escaped regular expressions when the string is using multi-byte encoding and the search is case insensitive. If the encoding is UTF-8 then this strategy could be used in git too. Especially that git already have support and helper functions to work with UTF-8. As for the other multi-byte encodings - I think the things would become more complicated. As far I know in UTF-8 the '{' character for example is two bytes not one. Maybe really a support could be added only for the UTF-8 and if the string is not UTF-8 to issue a warning. So maybe the following makes sense when a grep search is performed: * check if the multi-byte encoding is used. If it's and the search is case insensitive and the encoding is not UTF-8 give a warning; * if pcre is used and the string is UTF-8 encoded set the PCRE_UTF8 flag; * if the search is case insensitive, the string is fixed and the encoding used is UTF-8 use regcomp instead of kwsearch and escape any regex special characters in the pattern; And the question with the behavior of pickaxe search remains open. Using kwset does not work with case insensitive non-ASCII searches. Instead of fixing grep.c maybe it's better if new function is introduced that performs keyword searches so it could be used by both grep, diffcore-pickaxe and any other code in the future that may require such functionality. Or maybe diffcore-pickaxe should use grep instead of directly kwset/regcomp Regards, Plamen Totev >-------- Оригинално писмо -------- >От: Duy Nguyen pclouds@xxxxxxxxx >Относно: Re: Git grep does not support multi-byte characters (like UTF-8) >До: Plamen Totev <plamen.totev@xxxxxx> >Изпратено на: 06.07.2015 15:23 > I think we over-optimized a bit. If you your system provides regex > with locale support (e.g. Linux) and you don't explicitly use fallback > regex implementation, it should work. I suppose your test patterns > look "fixed" (i.e. no regex special characters)? Can you try just add > "." and see if case insensitive matching works? This is indeed the problem. When I added the "." the matching works just fine. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html