Re: New to PostgreSQL, performance considerations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote:

I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
charsets to a database. They're all simple C strings that doesn't have
the zero-byte in the midlle (like UTF16 would) and that doesn't
require any different processing unless you are doing case insensitive
search (them you would have a problem).

That's not the whole story. UTF-8 and other variable-width encodings don't provide a 1:1 mapping of logical characters to single bytes; in particular, combination characters opens the possibility of multiple different byte sequences mapping to the same code point; therefore, string comparison in such encodings generally cannot be done at the byte level (unless, of course, you first acertain that the strings involved are all normalized to an unambiguous subset of your encoding).

PostgreSQL's use of strings is not limited to string comparison. Substring extraction, concatenation, regular expression matching, up/ downcasing, tokenization and so on are all part of PostgreSQL's small library of text manipulation functions, and all deal with logical characters, meaning they must be Unicode-aware.

Alexander.


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux