Search Postgresql Archives

Re: TSearch2: Problems with compound words and stop words

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Timo,

take a look into .aff file and search 'compoundwords'.
german ispell file I got from http://j3e.de/ispell/igerman98/ has no support for compound words: 'compoundwords off'


Norwegian, for example, has:

compoundwords controlled z

compoundmin 4


Oleg


On Wed, 17 Nov 2004, Oleg Bartunov wrote:

On Wed, 17 Nov 2004, Timo Haberkern wrote:

sorry for the late answer, i was on holyday,

see my remarks below


Oleg Bartunov wrote:

On Fri, 5 Nov 2004, Timo Haberkern wrote:

Oleg,

i use TSearch2 with PostgreSQL 7.4.6 and i applied the compoundword patch yesterday. The configuration changed a little bit but the result is the same. I get no compound words. I'm using the locale de_DE with encoding ISO8859-1 for the database.

I think i spell is working correctly except the compound words. If i try

SELECT lexize('de_ispell', 'springt')

i get

lexize
{springen,springen}

which seems correct.


But a SELECT lexize('de_ispell', 'Autobahn')

results in

lexize
{autobahn}

i would expect {auto,bahn, autobahn}


Hmm, have you checked 'Autobahn' in ispell dictionary ? Does dictionary you used supports 'Z' flag for compound words ?

Autobahn is in the ispell dictionary. What does a ispell dictionary need to support the Z flag???



Try ispell -C Autobahn search 'compound' in 'man ispell' for details. the problem exists only if ispell *does* splits word correctly while tsearch2 doesn't. You should find correct ispell dictionary for german or create it
yourself. You may consult monzilla.net
http://staff.science.uva.nl/~christof/monzilla/research/project-dr.html




Timo








The new configuration after the compound word patch:


Seems you overestimate my capabilities :)



Actions dict_name <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=2&sortdir=asc&strings=expanded&page=1> dict_init <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=3&sortdir=asc&strings=expanded&page=1> dict_initoption <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=4&sortdir=asc&strings=expanded&page=1> dict_lexize <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=5&sortdir=asc&strings=expanded&page=1> dict_comment <http://www.rotex-service.com/phppgadmin/display.php?database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=6&sortdir=asc&strings=expanded&page=1> Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=simple&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> simple dex_init(text) /NULL/ dex_lexize(internal,internal,integer) Simple example of dictionary.
Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=en_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> en_stem snb_en_init(text) /usr/local/pgsql/share/contrib/english.stop snb_lexize(internal,internal,integer) English Stemmer. Snowball.
Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ru_stem&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> ru_stem snb_ru_init(text) /usr/local/pgsql/share/contrib/russian.stop snb_lexize(internal,internal,integer) Russian Stemmer. Snowball.
Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=ispell_template&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> ispell_template spell_init(text) /NULL/ spell_lexize(internal,internal,integer) ISpell interface. Must have .dict and .aff files
Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=synonym&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> synonym syn_init(text) /NULL/ syn_lexize(internal,internal,integer) Example of synonym dictionary
Edit <http://www.rotex-service.com/phppgadmin/display.php?action=confeditrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> Delete <http://www.rotex-service.com/phppgadmin/display.php?action=confdelrow&strings=expanded&page=1&key%5Bdict_name%5D=de_ispell&database=selina_rotex&schema=public&table=pg_ts_dict&return_url=tblproperties.php%3Fdatabase%3Dselina_rotex%26amp%3Bschema%3Dpublic%26table%3Dpg_ts_dict&return_desc=Back&sortkey=&sortdir=> de_ispell spell_init(text) DictFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.dict", AffFile="/usr/local/pgsql/share/contrib/dictonary/german_comb.aff", StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" spell_lexize(internal,internal,integer) /NULL/




Timo


Oleg Bartunov wrote:

Timo,

please, check you apply patch for compound word support.
What is version of postgresql ?
Does ispell dict works for non-compound words ?

    Oleg

On Fri, 5 Nov 2004, Timo Haberkern wrote:

Hi there,

i have some troubles with my TSearch2 Installation. I have done this
installation as described in http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words <http://www.sai.msu.su/%7Emegera/oddmuse/index.cgi/Tsearch_V2_compound_words> I used the german myspell dictionary from
http://lingucomponent.openoffice.org/spell_dic.html and converted it with
my2ispell


Nearly everything is working fine so far, except two problems:

1.) The stopword-file seems to be ignored: If i try it with SELECT
to_tsvector("default_german", "ein Haus") i get     "ein":1 "haus":2

ein should be a Stopword for german (and is defined the german.stop file as
well)


2.) The compound words feature doesn"t work too. I have tried a lot of words,
i.e. "Fehlermeldung" with SELECT to_tsvector("default_german", "Fehlermeldung")
i only get
"fehlermeldung":1 but i would expect "fehler" and "meldung" as seperated
entries. Is there anything wrong with the dictonary or my configuration?



My current configuration:

pg_ts_cfg:

default    default    C
default_russian    default    ru_RU.KOI8-R
simple    default    NULL
default_german    default    de_DE.ISO8859-1
    pg_ts_cfgmap:

default_german    host    {simple}
default_german    hword    {simple}
default_german    int    {simple}
default_german    nlhword    {simple}
default_german    nlpart_hword    {simple}
default_german    nlword    {simple}
default_german    part_hword    {simple}
default_german    sfloat    {simple}
default_german    uint    {simple}
default_german    uri    {simple}
default_german    url    {simple}
default_german    version    {simple}
default_german    word    {simple}
default_german    lpart_hword    {de_ispell,german_snowball}
default_german    lword    {de_ispell,german_snowball}
default_german    lhword    {de_ispell,german_snowball}


pg_ts_dict:

de_ispell | 17166 |
DictFile="/usr/local/pgsql/share/contrib/dictonary/german.dict",
AffFile="/usr/local/pgsql/share/contrib/dictonary/german.aff",
StopFile="/usr/local/pgsql/share/contrib/dictonary/german.stop" | 17167 | NULL
german_snowball | 17357 | NULL | 17162 | Snowball stemmer for german




Can anyone help me?

regards

Timo


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)




Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)




Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
  (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)


Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux