Search Postgresql Archives

Re: TSearch2 / Get all unique lexems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 8 Dec 2005, Hannes Dorbath wrote:

On 07.12.2005 16:13, Oleg Bartunov wrote:
hmm, you could dump tsvector column and use awk+sort+uniq

Thanks. I hoped for something possible inside a pl/pgsql proc. I'm trying to integrate pg_trgm with Tsearch2. I'm still on my UTF-8 database. Yes I know, there is _NO_ UTF-8 support of any kind in Tsearch2 yet, but I got it working to a degree that is OK for my application (Created my own stemmer variant, ispell dict, affix file etc). The last missing bit is to get a source for pg_trgm. I cannot use the the stat() function, because it breaks as soon it sees an UTF-8 char.

unless there is some way to ignore errors in utf8 convertation to text this is a dead-end. stat() function uses text representation.

You have to wait new release with full UTF8 support or go 'lazy' way,
i.e. use any tools to get a list of unique words and create pg_trgm index.
There are several questions:
* Do you actually need to be synchronized with tsvector ? * Do you need to recognize all words ? I supposed no. In real life you should
have a dictionary which you certainly need to recognize.


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux