Re: very slow left join

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ben wrote:
On Fri, 16 May 2008, Scott Marlowe wrote:

Well, I'm guessing that you aren't in locale=C and that the text

Correct, I am not. And my understanding is that by moving to the C locale, I would loose utf8 validation, so I don't want to go there. Though, it's news to me that I would get any kind of select performance boost with locale=C. Why would it help?

As far as I know the difference is that in the "C" locale PostgreSQL can use simple byte-ordinal-oriented rules for sorting, character access, etc. It can ignore the possibility of a character being more than one byte in size. It can also avoid having to consider pairs of characters where the ordinality of the numeric byte value of the characters is not the same as the ordinality of the characters in the locale (ie they don't sort in byte-value order).

If I've understood it correctly ( I don't use "C" locale databases myself and I have not tested any of this ) that means that two UTF-8 encoded strings stored in a "C" locale database might not compare how you expect. They might sort in a different order to what you expect, especially if one is a 2-byte or more char and the other is only 1 byte. They might compare non-equal even though they contain the same sequence of Unicode characters because one is in a decomposed form and one is in a precomposed form. The database neither knows the encoding of the strings nor cares about it; it's just treating them as byte sequences without any interest in their meaning.

If you only ever work with 7-bit ASCII, that might be OK. Ditto if you never rely on the database for text sorting and comparison.

Someone please yell at me if I've mistaken something here.

--
Craig Ringer


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux