Re: spanish locale question

Andreas Joseph Krogh <andreak@xxxxxxxxxxxx> · Fri, 04 May 2012 20:54:18 +0200

On 05/04/2012 07:31 PM, Tom Lane wrote:
Al Eridani<al.eridani@xxxxxxxxx>  writes:
What Tulio is saying is that 'leon' and 'león' are the same thing from
the point of view of sorting in Spanish, but his PostgreSQL seems to
think that 'leon' goes before 'león'.
Postgres never considers that two distinct strings are "equal".  If the
locale setting considers these equal (which isn't entirely clear from
the given evidence), PG would then sort them on the basis of their
character code values.

A possible workaround if you need to consider them equal is to strip the
accents before sorting (ie, something like "ORDER BY to_ascii(col)") but
this may well throw away more information than you want ...

Note that to_ascii barfs on unicode-input:

ERROR:  encoding conversion from UTF8 to ASCII not supported

Better install unaccent:

cd ./postgresql-9.1.2/contrib/unaccent
make install
psql
CREATE EXTENSION unaccent;
andreak=# select unaccent('león');
 unaccent
----------
 leon
(1 row)

--
Andreas Joseph Krogh<andreak@xxxxxxxxxxxx>  - mob: +47 909 56 963
Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general