Search Postgresql Archives

Re: HTML tags and tsearch2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 26 Jun 2008, Joanna Sharman wrote:

Hi,

I have recently started experimenting with tsearch2 and it seems that the default behaviour is to ignore HTML tags and treat them as word-separators. What I would like it to do is to ignore HTML tags within words, but instead of creating separate words, combine the characters separated by the tag into one word.

For example: in the database I have words like 'K<sub>ir</sub>' that need to be searched using the term without HTML tags, i.e. 'Kir'. Currently, the HTML tags are ignored and two words are stored in the vector, 'k' and 'ir'. I would like only one word, 'kir', to be stored in the vector, so that searches using the word 'kir' will match the row.

2 options - write HTML parser and preprocess text before to_tsvector.


A second, related question is whether it is possible to cause tsearch2 to split up words when it encounters digits, e.g. 'TM8' into 'TM' and '8'.

you can write your own dictionary or use dict_regex from http://vo.astronet.ru/arxiv/dict_regex.html


I am not sure if this functionality is possible to implement using tsearch2 or if there might be a better way, so I would be grateful for any advice or pointers to further reading on how I might do this. (I am using PostgreSQL version 8.1.10)

think about upgrading to 8.3


Many thanks in advance,
Joanna



	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux