Alright. I got it running and used
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ ;
specifically:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
Not sure where to find up-to-date/authorized the ispell dictionaries. I figured that I need to change this particular dictionary in order to avoid "ion" being split aways from words like "produktION/konstruktION" etc: =# select * from ts_debug('public.german_compound_ispell', 'konstruktion');+ alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+--------------+-----------------------------+---------------+------------------------------ asciiword | Word, all ASCII | konstruktion | {german_ispell,german_stem} | german_ispell | {konstruktion,konstrukt,ion} The splitting of compound words is unfortunately not consistent (wasserkraft vs konstruktionsplan): =# select * from ts_debug('public.german_compound_ispell', 'wasserkraft'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------------+-----------------------------+---------------+---------------------------- asciiword | Word, all ASCII | wasserkraft | {german_ispell,german_stem} | german_ispell | {wasserkraft,wasser,kraft} =# select * from ts_debug('public.german_compound_ispell', 'konstruktionsplan'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------------------+-----------------------------+---------------+--------------------- asciiword | Word, all ASCII | konstruktionsplan | {german_ispell,german_stem} | german_ispell | {konstruktion,plan} Not sure how the 'sch' come to be: =# select * from ts_debug('public.german_compound_ispell', 'rundflansch'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------------+-----------------------------+---------------+------------------------------ asciiword | Word, all ASCII | rundflansch | {german_ispell,german_stem} | german_ispell | {rund,flansch,rund,flan,sch} This is another funny example: =# select * from ts_debug('public.german_compound_ispell', 'datenbanken'); alias | description | token | dictionaries | dictionary | lexemes -----------+-----------------+-------------+-----------------------------+---------------+--------------------------------------------------------------------------------- asciiword | Word, all ASCII | datenbanken | {german_ispell,german_stem} | german_ispell | {datenbank,daten,date,banken,daten,date,bank,daten,date,banken,daten,date,bank} On 01.06.2015 09:25, Sven R. Kunze wrote:
-- Sven R. Kunze TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920 e-mail: srkunze@xxxxxxxxxxxx web: www.tbz-pariv.de Geschäftsführer: Dr. Reiner Wohlgemuth Sitz der Gesellschaft: Chemnitz Registergericht: Chemnitz HRB 8543 |