Search Postgresql Archives

Re: Tsearch2 and Unicode?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

Hi!

Oleg, what exactly do you mean by "tsearch2 doesn't support unicode yet"? 

It does seem to work fine in my database, it seems: 

./pg_controldata [mycluster] gives me
pg_control version number:            72
[...]
LC_COLLATE:                           de_DE.UTF-8
LC_CTYPE:                             de_DE.UTF-8

community_unicode=# SELECT pg_encoding_to_char(encoding) AS encoding FROM pg_database WHERE datname='community_unicode';
 encoding
----------
 UNICODE
(1 row)

community_unicode=# select to_tsvector('default_german', 'Ich fände, daß das Fehlen von Umlauten ein Ärgernis wäre.');
                           to_tsvector
------------------------------------------------------------------
 'daß':3 'wäre':10 'fehlen':5 'fände':2 'umlauten':7 'Ärgernis':9
(1 row)

community_unicode=# SELECT  message_id
community_unicode-#  , rank(idxfti, to_tsquery('default_german', 'Könige|Söldner'),0) as rank
community_unicode-#  FROM ct_com_board_message
community_unicode-#  WHERE idxfti @@ to_tsquery('default_german', 'Könige|Söldner')
community_unicode-#  order by rank desc
community_unicode-#  limit 10;
 message_id |   rank
------------+----------
    3191632 | 0.686189
    2803233 | 0.686189
    2935325 | 0.686189
    2882337 | 0.686189
    2842006 | 0.686189
    2854329 | 0.686189
    2841962 | 0.686189
    2999851 | 0.651322
    2869839 | 0.651322
    2999799 |  0.61258
(10 rows)

These results look alright to me, so I cannot reproduce this phenomenon of disappearing special characters in a unicode-database. Dawid, are you sure, you INITDB'd your cluster to the correct locale-settings?

Kind regards

   Markus

> -----Ursprüngliche Nachricht-----
> Von: pgsql-general-owner@xxxxxxxxxxxxxx 
> [mailto:pgsql-general-owner@xxxxxxxxxxxxxx] Im Auftrag von 
> Oleg Bartunov
> Gesendet: Mittwoch, 17. November 2004 17:32
> An: Dawid Kuroczko
> Cc: Pgsql General
> Betreff: Re:  Tsearch2 and Unicode?
> 
> Dawid,
> 
> unfortunately, tsearch2 doesn't support unicode yet.
> If you keep tsvector separately from data than you'll need 
> one more join.
> 
>  	Oleg
> 

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
      message can get through to the mailing list cleanly


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux