Re: [PATCH v2 1/2] grep: use static trans-case table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 28, 2012 at 04:20:30PM -0800, Junio C Hamano wrote:

> In order to prepare the kwset machinery for a case-insensitive search, we
> used to use a static table of 256 elements and filled it every time before
> calling kwsalloc().  Because the kwset machinery will never modify this
> table, just allocate a single instance globally and fill it at the compile
> time.

Hmm. I was going to complain that the original code used tolower() to
generate the table at run-time, and therefore respected the current
locale. But of course we have replaced tolower() with a
locale-independent version, so it should behave identically.

But that does make me wonder. Do people expect their case-insensitive
searches to work on non-ASCII characters? I would think yes, but I do
not use non-ASCII characters in the first place, so my opinion may not
mean much.

For that matter, does REG_ICASE respect locales? The glibc code appears
to consider it, but I couldn't make it work in some simple tests. But if
it does, that raises another weirdness: we fall back to kwset
transparently when a grep pattern contains no metacharacters. So you
would get different results for "-i --grep=é" versus "-i --grep=é.*".

Of course, even if we used a locale-respecting version of tolower in the
original code, I suspect that a byte table would be fundamentally
insufficient, anyway, in the face of multi-byte encodings like utf8.

So I don't think your patch is making the problem any worse. And even if
somebody wants to tackle the problem later, the solution would look so
unlike the original code that your change is not hurting their effort.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]