Search Postgresql Archives

Re: fulltext search and hunspell

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey,

thanks for your answer.

First I checked the links in the tsearch_data directory
de_de.affix, and de_de.dict are symlinks to the corresponding files in
/var/cache/postgresql/dicts/
Then I recreated them by using pg_updatedicts.

This is an extract of the de_de.affix file:

# this is the affix file of the de_DE Hunspell dictionary
# derived from the igerman98 dictionary
#
# Version: 20091006 (build 20100127)
#
# Copyright (C) 1998-2009 Bjoern Jacke <bjoern@xxxxxx>
#
# License: GPLv2, GPLv3 or OASIS distribution license agreement
# There should be a copy of both of this licenses included
# with every distribution of this dictionary. Modified
# versions using the GPL may only include the GPL

SET ISO8859-1
TRY esijanrtolcdugmphbyfvkwqxzäüößáéêàâñESIJANRTOLCDUGMPHBYFVKWQXZÄÜÖÉ-.

PFX U Y 1
PFX U   0     un       .

PFX V Y 1
PFX V   0     ver      .

SFX F Y 35
[...]

I cannot find "compoundwords controlled z" there, so I manually added it.

[...]
# versions using the GPL may only include the GPL

compoundwords  controlled z

SET ISO8859-1
TRY esijanrtolcdugmphbyfvkwqxzäüößáéêàâñESIJANRTOLCDUGMPHBYFVKWQXZÄÜÖÉ-.
[...]

Then I restarted PostgreSQL.

Now I get an error:
SELECT * FROM ts_debug('Schokoladenfabrik');
FEHLER:  falsches Affixdateiformat für Flag
CONTEXT:  Zeile 18 in Konfigurationsdatei
»/usr/share/postgresql/8.4/tsearch_data/de_de.affix«: »PFX U Y 1
«
SQL-Funktion »ts_debug« Anweisung 1
SQL-Funktion »ts_debug« Anweisung 1

Which means:
ERROR: wrong Affixfileformat for flag
CONTEXT: Line 18 in Configuration ...

If I add
COMPOUNDFLAG Z
ONLYINCOMPOUND L

instead of "compoundwords  controlled z"

I didn't get an error:

SELECT * FROM ts_debug('Schokoladenfabrik');
   alias   |   description   |       token       |
dictionaries          | dictionary  |      lexemes
-----------+-----------------+-------------------+-------------------------------+-------------+-------------------
 asciiword | Word, all ASCII | Schokoladenfabrik |
{german_hunspell,german_stem} | german_stem | {schokoladenfabr}
(1 row)

But it seems that the hunspell dictionary is not working for compound words.

Maybe pg_updatedicts has a bug and generates affix files in the wrong format?

Jens

2011/2/7 Oleg Bartunov <oleg@xxxxxxxxxx>:
> Jens,
>
> could you check affix file for
> compoundwords  controlled z
>
> also, can you provide link to dictionary files, so we can check if they
> supported, since we have only rudiment support of hunspell.
> btw,it'd be nice to have output from ts_debug() to make sure dictionaries
> actually used.
>
> Oleg

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux