Mohamed,
please, try to read docs and think a bit first.
On Mon, 2 Feb 2009, Mohamed wrote:
On Mon, Feb 2, 2009 at 4:34 PM, Oleg Bartunov <oleg@xxxxxxxxxx> wrote:
On Mon, 2 Feb 2009, Oleg Bartunov wrote:
On Mon, 2 Feb 2009, Mohamed wrote:
Hehe, ok..
I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal
just made the change you said and created it successfully and tried this
:
select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ????
?????????
?????')
but I got nothing... :(
You did wrong ! ts_lexize expects word, not phrase !
Mohamed, what did you expect from ts_lexize ? Please, provide us valuable
information, else we can't help you.
What I expected was something to be returned. After all they are valid words
taken from an article. (perhaps you don't see the words, but only ???... )
Am I wrong to expect something ? Should I go for setting up the
configuration completly first?
You should definitely read documentation
http://www.postgresql.org/docs/8.3/static/textsearch-debugging.html#TEXTSEARCH-DICTIONARY-TESTING
Period.
SELECT ts_lexize('norwegian_ispell',
'overbuljongterningpakkmesterassistent');
{over,buljong,terning,pakk,mester,assistent}
Check out this article if you need a sample.
http://www.aljazeera.net/NR/exeres/103CFC06-0195-47FD-A29F-2C84B5A15DD0.htm
Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)
yes
Read
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary
recognizes it as a known word. If it is identified as a stop word, or if no
dictionary recognizes the token, it will be discarded and not indexed or
searched for. The general rule for configuring a list of dictionaries is to
place first the most narrow, most specific dictionary, then the more general
dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or
simple, which recognizes everything."
Ok, but I don't have Thesaurus or a Snowball to fall back on. So when words
that are words but for some reason is not recognized "it will be discarded
and not indexed or searched for." which I consider a problem since I don't
trust my configuration to cover everything.
Is this not a valid concern?
quick example:
CREATE TEXT SEARCH CONFIGURATION arabic (
COPY = english
);
=# \dF+ arabic
Text search configuration "public.arabic"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+--------------
asciihword | english_stem
asciiword | english_stem
email | simple
file | simple
float | simple
host | simple
hword | english_stem
hword_asciipart | english_stem
hword_numpart | simple
hword_part | english_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | english_stem
Then you can alter this configuration.
Yes, I figured thats the next step but thought I should get the lexize to
work first? What do you think?
Just a thought, say I have this :
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem;
is it possible to keep adding dictionaries, to get both arabic and english
matches on the same column (arabic people tend to mix), like this :
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pga_ardict, ar_ispell, ar_stem, pg_english_dict, english_ispell,
english_stem;
Will something like that work ?
/ Moe
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general