Re: Fast tsearch2, trigram matching on short phrases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 22 Aug 2007, Carlo Stonebanks wrote:

I have read that trigram matching (similarity()) performance degrades when the matching is on longer strings such as phrases. I need to quickly match strings and rate them by similiarity. The strings are typically one to seven words in length - and will often include unconventional abbreviations and misspellings.

I have a stored function which does more thorough testing of the phrases, including spelling correction, abbreviation translation, etc... and scores the results - I pick the winning score that passes a pass/fail constant. However, the function is slow. My solution was to reduce the number of rows that are passed to the function by pruning obvious mismatches using similarity(). However, trigram matching on phrases is slow as well.

you didn't show us explain analyze of your select.


I have experimented with tsearch2 but I have two problems:

1) I need a "score" so I can decide if match passed or failed. trigram similarity() has a fixed result that you can test, but I don't know if rank() returns results that can be compared to a fixed value

2) I need an efficient methodology to create vectors based on trigrams, and a way to create an index to support it. My tsearch2 experiment with normal vectors used gist(text tsvector) and an on insert/update trigger to populate the vector field.

Any suggestions on where to go with this project to improve performance would be greatly appreciated.

Carlo



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux