Search Postgresql Archives

Re: Tsearch2 Dutch snowball stemmer in PG8.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Alban,

the documentation you're refereed on is for upcoming 8.3 release.
For 8.1 and 8.2 you need to do all machinery by hand. It's not difficult, for example:

-- sample tsearch2 configuration for search.postgresql.org
-- Creates configuration 'pg' - default, should match server's locale !!!
-- Change 'ru_RU.UTF-8'

begin;

-- create special (default) configuration 'pg'
update pg_ts_cfg set locale=NULL where locale = 'ru_RU.UTF-8';
insert into pg_ts_cfg values('pg','default','ru_RU.UTF8');

-- register 'pg_dict' dictionary using synonym template
-- postgres    pg
-- pgsql       pg
-- postgresql  pg
insert into pg_ts_dict
(select 'pg_dict',dict_init,
'/usr/local/pgsql-dev/share/contrib/pg_dict.txt',
dict_lexize, 'pg-specific dictionary'
from pg_ts_dict
where dict_name='synonym'
);

-- register ispell dictionary, check paths and stop words
-- I used iconv for english files, since there are some cyrillic stuff
insert into pg_ts_dict
(SELECT 'en_ispell', dict_init,
'DictFile="/usr/local/share/dicts/ispell/utf8/english-utf8.dict",'
 'AffFile="/usr/local/share/dicts/ispell/utf8/english-utf8.aff",'
 'StopFile="/usr/local/share/dicts/ispell/utf8/english-utf8.stop"',
 dict_lexize
 FROM pg_ts_dict
 WHERE dict_name = 'ispell_template'
 );

 -- use the same stop-word list as 'en_ispell' dictionary
UPDATE pg_ts_dict set dict_initoption='/usr/local/share/dicts/english.stop'
where dict_name='en_stem';



-- default token<->dicts mappings
insert into pg_ts_cfgmap  select 'pg', tok_alias, dict_name from public.pg_ts_cfgmap where ts_name='default';

-- modify mappings for latin words for configuration 'pg'
update pg_ts_cfgmap set dict_name = '{pg_dict,en_ispell,en_stem}'
where tok_alias in ( 'lword', 'lhword', 'lpart_hword' )
and ts_name = 'pg';

-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
--where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float')
and ts_name = 'pg';

end;

-- testing

select * from ts_debug('
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');


Oleg
On Wed, 3 Oct 2007, Alban Hertroys wrote:

Hello,

I'm trying to get a Dutch snowball stemmer in Postgres 8.1, but I can't
find how to do that.

I found CREATE FULLTEXT DICTIONARY commands in the tsearch2 docs on
http://www.sai.msu.su/~megera/postgres/fts/doc/index.html, but these
commands are apparently not available on PG8.1.

I also found the tables pg_ts_(cfg|cfgmap|dict|parser), but I have no
idea how to add a Dutch stemmer to those.

I did find some references to stem.[ch] files that were suggested to
compile into the postgres sources, but I cannot believe that's the right
way to do this (besides that I don't have sufficient privileges to
install such a version).

So... How do I do this?

The system involved is some version of Debian Linux (2.6 kernel); are
there any packages for a Dutch stemmer maybe?

I'm in a bit of a hurry too, as we're on a tight deadline :(

Regards,


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux