Postgresql8.1.3 tsearch2 with UTF8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

My Task is to update our SuSE8.2 Postgres7.4.1 Webserver with tsearch2 to
the Version SuSE9.3 with Postgres8.1.3 and tsearch2.
The Services are running but i have some  Problems with the
tsearch2
Configuration.


-------------------------------------------------------------------------------------------------------------------------------
old System:
SUSE8.2
Postgresql-7.4.1
tsearch2 (guide: References
on
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html
)
In this guide we do the kap. Configuration and Parser

new  System:
SuSE9.3
Postgresql-8.1.3
tsearch2 (2 guides: tsearch2
with
UTF-8)
-------------------------------------------------------------------------------------------------------------------------------


My Steps:
1. I've download the new tsearch2.8.2.tar.gz for UTF-8 and replace the
tsearch2 folder
2. install the tsearch2 with make && make install, without problems
3. locale= de_DE.UTF-8,
4. I've download the *.med *.aff *.stop files from sai.msu.su/
tsearch2_german_utf8.zip  german ispell dictionary (UTF-8)
   extract in /var/lib/ispell/
5. Compiling the German Snowball Stemmer: with stem.c and stem.h (make &&
make install) /dict_de/..
6. After i restored our database with psql -d codasdb -f dump.sql
   and psql -d codasdb -f tsearch2.sql
   and psql -d codasdb -f dict_de.sql
7. I set the dict_initoption='/var/lib/ispell/german.stop' where dict_name
='de'; ???
8. INSERT INTO pg_ts_cfg (ts_name, prs_name, locale) values
('default_german', 'default', 'de_DE.UTF-8');
   INSERT INTO pg_ts_dict (select 'de_ispell',
                                dict_init,
                                'DictFile="/var/lib/ispell/german.med",'
                                'AffFile="/var/lib/ispell/german.aff",'
                                'StopFile="/var/lib/ispell/german.stop"',
                                dict_lexize
                                FROM pg_ts_dict
                                where dict_name ='ispell_template');
9. SELECT set_curdict('de_ispell'); <- doesn't work with de_ispell i set it
('de'); ???

select 'Our first string used today'::tsvector; <-- runs


Now the Problem is:
codasdb=# select to_tsvector('PostgreSQL ist weitgehend konform mit dem
SQL92/SQL99-Standard, d.h. alle in dem Standard geforderten Funktionen
stehen zur Verfuegung und verhalten sich so, wie vom Standard gefordert;
dies ist bei manchen kommerziellen sowie nichtkommerziellen SQL-Datenbanken
bisweilen nicht gegeben.');
ERROR:  invalid UTF-8 byte sequence detected near byte 0xe4


I've testet with
two
guides:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2_german_utf8.html
http://www.tauceti.net/roller/page/cetixx/20060401 (german)



Can
anyone
help?


Raphi





----------------------------------------------------------------------------------------------------------------------------------------------------------
Configuration:

codasdb=# select * from pg_ts_cfg;
     ts_name     | prs_name |    locale
-----------------+----------+--------------
 default         | default  | C
 default_russian | default  | ru_RU.KOI8-R
 utf8_russian    | default  | ru_RU.UTF-8
 simple          | default  |
 default_german  | default  | de_DE.UTF-8


codasdb=# \l
        List of databases
   Name    |  Owner   | Encoding
-----------+----------+----------
 codasdb   | postgres | UTF8
 postgres  | postgres | UTF8
 template0 | postgres | UTF8
 template1 | postgres | UTF8



codasdb=# select * from pg_ts_dict;
    dict_name    |         dict_init          |                             
                    dict_initoption                                         
        |               dict_lexize               |                 

dict_comment
-----------------+----------------------------+-------------------------------------------------------------------------------------------------------------------+-----------------------------------------+--------------------------------------------------
 simple          | dex_init(internal)         |                             
                                                                            
        | dex_lexize(internal,internal,integer)   | Simple example of
dictionary.
 en_stem         | snb_en_init(internal)      | contrib/english.stop        
                                                                            
        | snb_lexize(internal,internal,integer)   | English Stemmer.
Snowball.
 ru_stem_koi8    | snb_ru_init_koi8(internal) | contrib/russian.stop        
                                                                            
        | snb_lexize(internal,internal,integer)   | Russian Stemmer.
Snowball. KOI8 Encoding
 ru_stem_utf8    | snb_ru_init_utf8(internal) | contrib/russian.stop.utf8   
                                                                            
        | snb_lexize(internal,internal,integer)   | Russian Stemmer.
Snowball. UTF8 Encoding
 ispell_template | spell_init(internal)       |                             
                                                                            
        | spell_lexize(internal,internal,integer) | ISpell interface. Must
have .dict and .aff files
 synonym         | syn_init(internal)         |                             
                                                                            
        | syn_lexize(internal,internal,integer)   | Example of synonym
dictionary
 de              | dinit_de(internal)         | /var/lib/ispell/german.stop
                                                                            
        | snb_lexize(internal,internal,integer)   | Snowball stemmer for
German
 de_ispell       | spell_init(internal)      
|
DictFile="/var/lib/ispell/german.med",AffFile="/var/lib/ispell/german.aff",StopFile="/var/lib/ispell/german.stop"
| spell_lexize(internal,internal,integer) |
(8 rows)

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux