Search Postgresql Archives

Re: How to switch off Snowball stemmer for tsearch2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 23 Aug 2007, Dmitry Koterov wrote:


Now

select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
- it is completely wrong!

I have a database with all Russian name, is it possible to use it (how?)
to

if you have such database why just don't write special dictionary and
put it in front ?


Of course because this is a database of Russian NAMES, but NOT a database of
surnames.


make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
needed:

function new_lexize($string) {
 $stem = lexize('ru_ispell_cp1251', $string);
 if ($stem in names_database) return $string; else return $stem;
}

Maybe tsearch2 implements this logic already?

write your own dictionary, which implements any logic you need. In your
case it's just a wrapper around ispell, which will returns original string
not stem. See example
http://www.sai.msu.su/~megera/postgres/fts/doc/fts-intdict-xmp.html
and russian article http://www.sai.msu.su/~megera/postgres/talks/fts_pgsql_intro.html#ftsdict


sure, it's how text search mapping works.


Could you please detalize?

you create dictionary surnames_dict and configure pg_ts_cfgmap to process token of type nlword by surnames_dict, ru_ispell, ru_stem, for example.


Of course I can create all word-forms of all Russian names using ispell and
then - subtract this full list from Ispell dictionary (so I will remove
"Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction
algorythm already.


don't do that ! Just go plain way.

	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux