Vincas Dargis wrote: > We have problems (currently using 8.4, but also in latest 9.1.3) in > our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'), > Russian and of course potentially other languages. > > For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž. > > lower() and ~* comparison works only with locale that is set (no > internationalization). > > Could we expect Unciode support in near future? Or should we do quick > hacks by reimplementing regexp_replace(), lower(), upper() and other > string SQL functions using, for example, Qt libraries..? Or maybe are > there some kind simpler workarounds? I tried it with 9.1.3 on Linux: upper() and lower() works fine, no matter what the database encoding is: test=> SELECT upper('acząčž'); upper -------- ACZĄČŽ (1 row) And this seems OK with LATIN7: lt2=> SHOW server_encoding; server_encoding ----------------- LATIN7 (1 row) lt2=> SHOW lc_ctype; lc_ctype ---------- lt_LT (1 row) lt2=> SHOW lc_collate; lc_collate ------------ lt_LT (1 row) lt2=> SELECT 'ą' ~* '\w'; ?column? ---------- t (1 row) But it looks wrong with UTF8: lt=> SHOW server_encoding; server_encoding ----------------- UTF8 (1 row) lt=> SHOW lc_ctype; lc_ctype ------------ lt_LT.utf8 (1 row) lt=> SHOW lc_collate; lc_collate ------------ lt_LT.utf8 (1 row) lt=> SELECT 'ą' ~* '\w'; ?column? ---------- f (1 row) Is that what you are complaining about? Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general