On Tue, Apr 13, 2004 at 12:32:17PM -0400, Tom Lane wrote: > Holger Klawitter <lists@klawitter.de> writes: > > In order to avoid interaction with gcc, cat and others else I've written a > > new program, reading from a file. > > After setting up the test case and duplicating your problem, I realized > I was being dense :-( ... this is a well-known issue. Need more > caffeine before answering bug reports obviously ... > > The problem is that PG's upper() and lower() functions are based on > the C library's <ctype.h> functions (toupper() and tolower()), which of > course only work for single-byte character sets. So they cannot work on > UTF8 data. > > There has been some talk of rewriting these functions to use the > <wctype.h> API where available, but no one's actually stepped up to the > plate and done it. IIRC the main sticking point was figuring out how to > get from whatever character encoding the database is using into the wide > character set representation the C library wants. There doesn't seem to > be a portable way of discovering exactly what the wchar encoding is > supposed to be for the current locale setting. There is the "libcharset - portable character set determination. library". But maintain this library with a lot of OS depend code is probably nothing simple. It's used in standard iconv. http://www.haible.de/bruno/packages-libcharset.html But I'm not sure if it resolve something, because there is not gaurantee of any connection between the current locale setting and string encoding. SELECT upper( convert('foo', 'X', 'Y') ); IMHO solution is add to "struct varlena" pointer to pg_encname that knows handle PostgreSQL encoding information and make each PostgreSQL string independent and self-described. Or is there something why is this useless? Karel -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly