Search Postgresql Archives

Re: How to switch off Snowball stemmer for tsearch2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 23 Aug 2007, Dmitry Koterov wrote:

Oh! Thanks!

delete from pg_ts_cfgmap where dict_name = ARRAY['ru_stem'];

solves the root of the problem. But unfortunately
russian.med(ru_ispell_cp1251) contains all Russian names, so "Ivanov"
is converted to
"Ivan" by ispell too. :-(

Now

select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
- it is completely wrong!

I have a database with all Russian name, is it possible to use it (how?) to

if you have such database why just don't write special dictionary and put it in front ?

make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
needed:

function new_lexize($string) {
 $stem = lexize('ru_ispell_cp1251', $string);
 if ($stem in names_database) return $string; else return $stem;
}

Maybe tsearch2 implements this logic already?

sure, it's how text search mapping works. Dmitry, seems your company could be
my client :)


On 8/22/07, Oleg Bartunov <oleg@xxxxxxxxxx> wrote:

On Wed, 22 Aug 2007, Dmitry Koterov wrote:

Suppose I cannot add such synonyms, because:

1. There are a lot of surnames, cannot take care about all of them.
2. After adding a new surname I have to re-calculate all full-text
indices,
it costs too much (about 10 days to complete the recalculation).

So, I neet exactly what I ast - switch OFF stem guessing if a word is
not in
the dictionary.

no problem, just modify pg_ts_cfgmap, which contains mapping
token - dictionaries.

if you change configuration you should rebuild tsvector and reindex.
10 days looks very suspicious.



On 8/22/07, Oleg Bartunov <oleg@xxxxxxxxxx> wrote:

On Wed, 22 Aug 2007, Dmitry Koterov wrote:

Hello.

We use ispell dictionaries for tsearch2 (ru_ispell_cp1251)..
Now Snowball stemmer is also configured.

How to properly switch OFF Snowball stemmer for Russian without
turning
off
ispell stemmer? (It is really needed, because "Ivanov" is not the same
as
"Ivan".)
Is it enough and correct to simply delete the row from pg_ts_dict or
not?

Here is the dump of pg_ts_dict table:

don't use dump, plain select would be  better. In your case, I'd
suggest to follow standard way - create synonym file like
ivanov ivanov
and use it before other dictionaries. Synonym dictionary will recognize
'Ivanov' and return 'ivanov'.



dict_name    dict_init    dict_initoption    dict_lexize    dict_comment
en_ispell    spell_init(internal)


DictFile=/usr/lib/ispell/english.med,AffFile=/usr/lib/ispell/english.aff,StopFile=/usr/share/pgsql/contrib/english.stop
spell_lexize(internal,internal,integer)
en_stem    snb_en_init(internal)    contrib/english.stop
snb_lexize(internal,internal,integer)    English Stemmer. Snowball.
ispell_template    spell_init(internal)
spell_lexize(internal,internal,integer)    ISpell interface. Must have
.dict
and .aff files
ru_ispell_cp1251    spell_init(internal)


DictFile=/usr/lib/ispell/russian.med,AffFile=/usr/lib/ispell/russian.aff,StopFile=/usr/share/pgsql/contrib/russian.stop.cp1251
spell_lexize(internal,internal,integer)
ru_stem_cp1251    snb_ru_init_cp1251(internal)
contrib/russian.stop.cp1251    snb_lexize(internal,internal,integer)
Russian Stemmer. Snowball. WINDOWS (cp1251) Encoding
ru_stem_koi8    snb_ru_init_koi8(internal)    contrib/russian.stop
snb_lexize(internal,internal,integer)    Russian Stemmer. Snowball.
KOI8
Encoding

ru_stem_utf8    snb_ru_init_utf8(internal)    contrib/russian.stop.utf8
snb_lexize(internal,internal,integer)    Russian Stemmer. Snowball.
UTF8
Encoding


simple    dex_init(internal)        dex_lexize(internal,internal,integer)
Simple example of dictionary.
synonym    syn_init(internal)
syn_lexize(internal,internal,integer)    Example of synonym dictionary
thesaurus_template    thesaurus_init(internal)
thesaurus_lexize(internal,internal,integer,internal)    Thesaurus
template,
must be pointed Dictionary and DictFile


        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that
your
       message can get through to the mailing list cleanly



        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux