Re: LC_CTYPE and matching accented chars

Michael Fuhr <mike@xxxxxxxx> · Wed, 27 Jun 2007 01:59:05 -0600

On Wed, Jun 27, 2007 at 09:28:24AM +1200, Martin Langhoff wrote:
> Alvaro Herrera wrote:
> > I think it would be much easier if you did something like
> > 
> > select * from test where lower(to_ascii(value)) = lower(to_ascii('martín'));
> > 
> > When to_ascii doesn't work (for example because it doesn't work in UTF8)
> > you may want to use convert() to recode the text to latin1 or latin9.
> 
> Well, with the example above to_ascii doesn't work.
> 
>   select to_ascii(value) from test ;
>   ERROR:  encoding conversion from UTF8 to ASCII not supported
> 
> And neither does convert
> 
>   select convert(value using utf8_to_ascii) from test ;
>   ERROR:  character 0xc3 of encoding "MULE_INTERNAL" has no equivalent
>    in "SQL_ASCII"

As Alvaro suggested, try converting to latin1 or latin9 and then
calling to_ascii:

select 'martin' = to_ascii(convert('martín', 'latin1'), 'latin1');
 ?column? 
----------
 t
(1 row)

For other possibilities search the list archives for examples of
"unaccent" functions that normalize text to NFD (Unicode Normalization
Form D) and remove nonspacing marks.  Here's a message with a couple
of PL/Perl functions:

http://archives.postgresql.org/pgsql-general/2007-01/msg00702.php

-- 
Michael Fuhr