Search Postgresql Archives

Hunspell as filtering dictionary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I am trying to create a ts_vector from a French text. Here are the operations that seem logical to perform in that order:

1. remove stopwords
2. use hunspell to find words roots
3. unaccent

I first tried:

CREATE TEXT SEARCH CONFIGURATION fr_conf (copy='simple');

ALTER TEXT SEARCH CONFIGURATION fr_conf

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

 word, hword, hword_part

WITH unaccent, french_hunspell;


select * from to_tsvector('fr_conf', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':5 'aventurer':5 'rencontrer':3


But the verb "découvrir" is missing :(


If I try with french_hunspell only, I get it, but with the accent:


select * from to_tsvector('french_hunspell', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':6 'aventurer':6 'découvrir':2 'rencontrer':4


I also tried:

CREATE TEXT SEARCH CONFIGURATION fr_conf2 (copy='simple');

ALTER TEXT SEARCH CONFIGURATION fr_conf2

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

 word, hword, hword_part

WITH french_hunspell, unaccent;


select * from to_tsvector('fr_conf2', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':5 'aventurer':5 'rencontrer':3


But I guess unaccent is never called.

I believe this is because french_hunspell is not a filtering dictionary, but I might be wrong. So is there a way to get this result from any FTS configuration (existing or :

-- 'aventure':6 'aventurer':6 'decouvrir':2 'rencontrer':4


Thanks,

Bertrand

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux