Search Postgresql Archives

Re: another seemingly simple encoding question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This doesn't sound like your problem, but I'll explain the normalization issue using Korean as an example, since that seems to be your data: There are codepoints in Unicode both for Hangul and Jamo, so a Hangul glyph can be represented either with the single corresponding codepoint, or as two or three Jamo codepoints. A Unicode font would display these two alternatives identically. In any Unicode encoding, including UTF8, these two strings would not be byte-for-byte identical. The Unicode normalization forms are four algorithms for normalizing the strings in such a way that they do compare identically.

Anyway, it sounds like you have the opposite problem, two strings that are comparing equal when you think they shouldn't. I don't know that anyone can help you unless you post an actual example of two such strings.

- John D. Burger
  MITRE



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux