problem is that my string -- which is in utf-8 -- because all input is converted first in php with $str_out = mb_convert_encoding($str_in, "UTF-8"); and the query, which is like "select wordid from korean_english where word='utf8string'"; and it is returning wordids for words which are not = utf8string (in debug mode) the above is output as UTF-8 by php / browser encoding over the web, and then "exit;" is called, so i just grab it from the browser by cutting and pasting the whole query string. running the query in php and from psql return the same bad wordids, pointing that the encoding is consistent through the cut-and-paste operation. i don't understand what a "unicode normalization form" is. the postgres docs http://www.postgresql.org/docs/8.0/interactive/multibyte.html say Table 20-1. Server Character Sets Name Description UNICODE Unicode (UTF-8) so i thought they were the same, and i dont know about "unicode normalization form". my question is why isn't the utf8string in query returning only matching, corresponding wordids from the database.... thx. 2006-03-24 (금), 08:56 -0500, John D. Burger 쓰시길: > > i have a problem matching a utf8 string with a field in a database > > encoded in utf8. > > You seem to give all the details of your configuration, but unless I > misread your message, you don't say what the actual problem is. Can > you provide more details? What exactly doesn't work? > > This may not be the issue, but many people don't realize that there are > sometimes multiple ways to encode what is conceptually the same string > in UTF8 (or any of the Unicode encodings). If you do not canonicalize > your strings using one of the Unicode normalization forms, then > seemingly identical strings may not match, because they are not > byte-for-byte identical. > > - John D. Burger > MITRE >