Hi,
First of all, excuse my poor english :)
I'm working on a fulltext database with tsearch2, which contains french
historical writings.
I'm using the fr_ispell dictionnary that can be found here :
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
(ispell-french.tar.gz
<http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/dicts/ispell/ispell-french.tar.gz>
- submitted by Max Jacob)
The database encoding is LATIN1
The problem is the writings contains many names of personnalities. For
example : Churchill (the database covers WWII). But when I try to search
for these names, nothing is found.
I tried many things, like this introduction :
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
And I think the problem's root is that no lexem is found (I could even
say an empty lexem is found).
With the default en_stem dictionnary, I get this :
SELECT lexize('en_stem', 'churchill');
"{churchil}"
Then, I try to add the french dictionnary :
INSERT INTO pg_ts_dict
(SELECT 'fr_ispell',
dict_init,
'DictFile="/home/.../french.dict",'
'AffFile="/home/.../french.aff",'
'StopFile="/home/.../french.stop"',
dict_lexize
FROM pg_ts_dict
WHERE dict_name = 'ispell_template');
And the result is :
SELECT lexize('fr_ispell', 'churchill');
""
My questions are :
- Is it OK to give empty string as a result for a word that is not in
the dictionnary, neither in the stop words ?
- Is there a way to get the word itself as a result, when the word is
not in the dictionnary, neither in the stop words ?
- If yes, how ?
I'm also interested in any information you could give me...
Many thanks !
Greg Maitrallain.
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general