On Fri, Jan 12, 2007 at 10:58:36PM +0100, Martijn van Oosterhout wrote: > On Fri, Jan 12, 2007 at 10:16:22PM +0100, Jiří Němec wrote: > > I would like to remove diacritical marks from a string in a SQL query. > > I tried to convert a UTF8 string to ASCII but it doesn't work for me. > > > > SELECT convert('ěščřžýáíé','UTF8','SQL_ASCII') > > I don't think postgres has any stuff builtin for that, but other > languages (like perl) have modules to do this kind of thing. The method > is to decompose the string to normal form D, strip the diacritics, and > recompose what's left. A technique that's been posted before might work: SELECT to_ascii(convert('ěščřžýáíé', 'LATIN2'), 'LATIN2'); to_ascii ----------- escrzyaie (1 row) to_ascii() supports only LATIN1, LATIN2, LATIN9, and WIN1250 so you have to convert to one of those encodings first. As Martijn suggested, you could use Perl. Here's an example with Text::Unaccent (you'll need to use encoding names that iconv recognizes): CREATE FUNCTION unaccent(charset text, string text) RETURNS text AS $$ use Text::Unaccent; return unac_string($_[0], $_[1]); $$ LANGUAGE plperlu IMMUTABLE STRICT; SELECT unaccent('UTF-8', 'ěščřžýáíé'); unaccent ----------- escrzyaie (1 row) Here's an example that uses Unicode::Normalize to strip non-spacing and enclosing marks: CREATE FUNCTION unaccent(string text) RETURNS text AS $$ use Unicode::Normalize; my $nfd_string = NFD($_[0]); $nfd_string =~ s/[\p{Mn}\p{Me}]//g; return NFC($nfd_string); $$ LANGUAGE plperlu IMMUTABLE STRICT; SELECT unaccent('ěščřžýáíé'); unaccent ----------- escrzyaie (1 row) -- Michael Fuhr