Search Postgresql Archives

Re: Unicode normalization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



No,

I need a solution which is as generic as possible. I use UTF-8 encoded unicode strings on all levels. This is what I have done so far:


1) Writing a separate Python command line script for testing - works as expected:

#!/usr/bin/python

import sys
import unicodedata

str = sys.argv[1].decode('UTF-8')
str = unicodedata.normalize('NFKD', str)
str = ''.join(c for c in str if unicodedata.combining(c) == 0)
print str


2) Transfering this to PL/Python:

CREATE OR REPLACE FUNCTION test (str text)
 RETURNS text
AS $$
   import unicodedata
   return unicodedata.normalize('NFKD', str.decode('UTF-8'))
$$ LANGUAGE plpythonu;

Problem: plpython throws an error, where my commandline script did it correctly:

# select test('aÄÖÜ');

ERROR:  plpython: function "test" could not create return value
DETAIL: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\u0308' in position 2: ordinal not in range(128)



I use PG 8.3 and Python 2.5.2. How can I make plpython behaving like in a normal python environment?


In the end it should look like this:

CREATE TABLE t (
...
ts ts_vector NOT NULL
);

INSERT INTO t (ts) VALUES(to_tsvector(normalize(?)));

Andi


David Fetter schrieb:
On Wed, Sep 16, 2009 at 07:20:21PM +0200, Andreas Kalsch wrote:
Has somebody integrated Unicode normalization into Postgres? if not, I would have to implement my own function by using this CPAN module: http://search.cpan.org/~sadahiro/Unicode-Normalize-1.03/ .

I need a function which removes all diacritics (1) and transforms some characters to a more compatible form (2) to get a better index on strings.

Best,

Andi


1) à,ä, ... => a
2) ø => o, ƒ => f, ª => a

You mean something like this?

http://wiki.postgresql.org/wiki/Strip_accents_from_strings%2C_and_output_in_lowercase

Cheers,
David.


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux