On 07.07. 2015 at 02:02, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > On Tue, Jul 7, 2015 at 3:10 AM, René Scharfe <l.s.r@xxxxxx> wrote: > > Am 06.07.2015 um 14:42 schrieb Nguyễn Thái Ngọc Duy: > > So the optimization before this patch was that if a string was searched for > > without -F then it would be treated as a fixed string anyway unless it > > contained regex special characters. Searching for fixed strings using the > > kwset functions is faster than using regcomp and regexec, which makes the > > exercise worthwhile. > > > > Your patch disables the optimization if non-ASCII characters are searched > > for because kwset handles case transformations only for ASCII chars. > > > > Another consequence of this limitation is that -Fi (explicit > > case-insensitive fixed-string search) doesn't work properly with non-ASCII > > chars neither. How can we handle this one? Fall back to regcomp by > > escaping all special characters? Or at least warn? > > Hehe.. I noticed it too shortly after sending the patch. I was torn > between simply documenting the limitation and waiting for the next > person to come and fix it, or quoting the regex then passing to > regcomp. GNU grep does the quoting in this case, but that code is > GPLv3 so we can't simply copy over. It could be a problem if we need > to quote a regex in a multibyte charset where ascii is not a subset. > But i guess we can just go with utf-8.. I played a little bit with the code and I came up with this function to escape regular expressions in utf-8. Hope it helps. static void escape_regexp(const char *pattern, size_t len, char **new_pattern, size_t *new_len) { const char *p = pattern; char *np = *new_pattern = xmalloc(2 * len); int chrlen; *new_len = len; while (len) { chrlen = mbs_chrlen(&p, &len, "utf-8"); if (chrlen == 1 && is_regex_special(*pattern)) *np++ = '\\'; memcpy(np, pattern, chrlen); np += chrlen; pattern = p; } *new_len = np - *new_pattern; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html