Search Postgresql Archives

Re: [tsvector] to_tsvector called multiple times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



For future reference: https://github.com/snowballstem/snowball/issues/19


On 26.05.2015 12:29, Sven R. Kunze wrote:
Thanks. It seems as if I have use snowball. So, I go ahead and post my issue at github.


Maybe, I have difficulties to understand the relationship/dependencies between all these 'maybe' available dictionary/parser/stemmer packages.

What happens if I install all packages for a single language? (hunspell, myspell, ispell, snowball)

Are they complementary? Do they replace each other?


>>> \dFd
                             List of text search dictionaries
   Schema   |      Name       | Description
------------+-----------------+-----------------------------------------------------------
 pg_catalog | danish_stem     | snowball stemmer for danish language
 pg_catalog | dutch_stem      | snowball stemmer for dutch language
 pg_catalog | english_stem    | snowball stemmer for english language
 pg_catalog | finnish_stem    | snowball stemmer for finnish language
 pg_catalog | french_stem     | snowball stemmer for french language
 pg_catalog | german_stem     | snowball stemmer for german language
 pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
 pg_catalog | italian_stem    | snowball stemmer for italian language
 pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
 pg_catalog | portuguese_stem | snowball stemmer for portuguese language
 pg_catalog | romanian_stem   | snowball stemmer for romanian language
 pg_catalog | russian_stem    | snowball stemmer for russian language
pg_catalog | simple | simple dictionary: just lower case and check for stopword
 pg_catalog | spanish_stem    | snowball stemmer for spanish language
 pg_catalog | swedish_stem    | snowball stemmer for swedish language
 pg_catalog | turkish_stem    | snowball stemmer for turkish language
(16 rows)


On 26.05.2015 12:09, Albe Laurenz wrote:
Sven R. Kunze wrote:
However, are you sure, I am using snowball? Maybe, I am reading the
documenation wrong:
test=> SELECT * FROM ts_debug('german', 'system');
alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+--------+---------------+-------------+--------- asciiword | Word, all ASCII | system | {german_stem} | german_stem | {syst}
(1 row)

test=> \dFd german_stem
                 List of text search dictionaries
    Schema   |    Name     |             Description
------------+-------------+--------------------------------------
  pg_catalog | german_stem | snowball stemmer for german language
(1 row)

http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
but it seems as it depends on which packages (ispell, hunspell, myspell,
snowball + corresponding languages) my system has installed.

Is there an easy way to determine which of these packages PostgreSQL
uses AND what for?
If you use a standard PostgreSQL distribution, you will have no ispell
dictionary (as the documentation you quote says).
You can always list all dictionaries with "\dFd" in psql.

Sure. That might be the problem. It occurs to me that stems (if detected
as such) should be left alone.
In case a stem is real German word, it should be stemmed to itself anyway
If not, it might help not to stem in order to avoid errors.
Yes, but that would mean that you have a way to determine from a string
whether it is a word or a stem or both, and the software does not do that.

Yours,
Laurenz Albe


Regards,



--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze@xxxxxxxxxxxx
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux