Alban,
the documentation you're refereed on is for upcoming 8.3 release.
For 8.1 and 8.2 you need to do all machinery by hand. It's not
difficult, for example:
-- sample tsearch2 configuration for search.postgresql.org
-- Creates configuration 'pg' - default, should match server's locale !!!
-- Change 'ru_RU.UTF-8'
begin;
-- create special (default) configuration 'pg'
update pg_ts_cfg set locale=NULL where locale = 'ru_RU.UTF-8';
insert into pg_ts_cfg values('pg','default','ru_RU.UTF8');
-- register 'pg_dict' dictionary using synonym template
-- postgres pg
-- pgsql pg
-- postgresql pg
insert into pg_ts_dict
(select 'pg_dict',dict_init,
'/usr/local/pgsql-dev/share/contrib/pg_dict.txt',
dict_lexize, 'pg-specific dictionary'
from pg_ts_dict
where dict_name='synonym'
);
-- register ispell dictionary, check paths and stop words
-- I used iconv for english files, since there are some cyrillic stuff
insert into pg_ts_dict
(SELECT 'en_ispell', dict_init,
'DictFile="/usr/local/share/dicts/ispell/utf8/english-utf8.dict",'
'AffFile="/usr/local/share/dicts/ispell/utf8/english-utf8.aff",'
'StopFile="/usr/local/share/dicts/ispell/utf8/english-utf8.stop"',
dict_lexize
FROM pg_ts_dict
WHERE dict_name = 'ispell_template'
);
-- use the same stop-word list as 'en_ispell' dictionary
UPDATE pg_ts_dict set dict_initoption='/usr/local/share/dicts/english.stop'
where dict_name='en_stem';
-- default token<->dicts mappings
insert into pg_ts_cfgmap select 'pg', tok_alias, dict_name from public.pg_ts_cfgmap where ts_name='default';
-- modify mappings for latin words for configuration 'pg'
update pg_ts_cfgmap set dict_name = '{pg_dict,en_ispell,en_stem}'
where tok_alias in ( 'lword', 'lhword', 'lpart_hword' )
and ts_name = 'pg';
-- we won't index/search some tokens
update pg_ts_cfgmap set dict_name = NULL
--where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float','word')
where tok_alias in ('email', 'url', 'sfloat', 'uri', 'float')
and ts_name = 'pg';
end;
-- testing
select * from ts_debug('
PostgreSQL, the highly scalable, SQL compliant, open source object-relational
database management system, is now undergoing beta testing of the next
version of our software: PostgreSQL 8.2.
');
Oleg
On Wed, 3 Oct 2007, Alban Hertroys wrote:
Hello,
I'm trying to get a Dutch snowball stemmer in Postgres 8.1, but I can't
find how to do that.
I found CREATE FULLTEXT DICTIONARY commands in the tsearch2 docs on
http://www.sai.msu.su/~megera/postgres/fts/doc/index.html, but these
commands are apparently not available on PG8.1.
I also found the tables pg_ts_(cfg|cfgmap|dict|parser), but I have no
idea how to add a Dutch stemmer to those.
I did find some references to stem.[ch] files that were suggested to
compile into the postgres sources, but I cannot believe that's the right
way to do this (besides that I don't have sufficient privileges to
install such a version).
So... How do I do this?
The system involved is some version of Debian Linux (2.6 kernel); are
there any packages for a Dutch stemmer maybe?
I'm in a bit of a hurry too, as we're on a tight deadline :(
Regards,
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq