Search Postgresql Archives

Re: ts_headline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 23 Feb 2008, Stephen Davies wrote:

As it turns out, all I needed was in the doco but the key element - the first
config arg to ts_headline - was not in any of the examples so I missed it.

aha, Original one were based on default configuration, but then concept was changed, but the examples were not
modified.


Would it be possible for ts_headline to work with the pre-parsed ts_vector?

it's impossible, Richard already explained you the reasons.


I see references to future plans for phrase searching in ts. Is there a date
for this?

Not yet. The problem mostly algebraical :) Simple 'exact search' is doable, but
we need something more, since we support boolean operators, pluggable dictionaries (which could produce several lexemes, for example),
and document structure (lexem weights). So, we need to define consistent
algebra for text, to have predictable results. This is quite a complex task,
which require a lot of dedicated time, which we don't have.


Cheers and thanks,
Stephen
Davies


On Friday 22 February 2008 22:54, Oleg Bartunov wrote:
On Fri, 22 Feb 2008, Stephen Davies wrote:
Hmmmm!
I think I now understand the ts position better, thank you.

Part of my problem has been that I am used to the functionality of Open
Text's LCS (aka BASIS) product which handles text differently.

It includes the position (and context) information in the index and does
"remember" how the text was parsed so does not need to reparse to insert
hit navigation tags nor need pointers as to how to parse queries. (It
also supports phrase searching.)

Now that I have a better understanding of ts, I think I will be able to
make it do at least most of what I hoped for.

I'm wondering if it was not described in the text search documentation :)

Thank you again for your help with this.

Cheers,
Stephen Davies

On Friday 22 February 2008 20:45, Richard Huxton wrote:
Stephen Davies wrote:
Unfortunately, my link to the box with the test database is down due to
lack of maintenance by our local telco (Telstra) but I think that I
also missed the optional config arg to ts_headline.

The lack of link also means that I cannot confirm your findings but
your logic looks good.

Looks like ALTER DATABASE SET default_text_config='english' is what you
need.

It begs the question, however, as to why ts-headline needs to reparse
the raw text.

It needs to line up tsvector lexemes with actual characters in the text.
The tsvector is missing punctuation, any stopwords (the, it, a) as well
as being stemmed (if your dictionary does that).

Also, it's looking for a short span of words that provide the best
match. That might not be a complete match of course, and is different to
how you'd normally look to use a tsvector.

At least in my case, I am using a trigger to parse the combination of
Title and Abstract to a ts_vector field in the table row (as suggested
in 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already
available to ts_headline.

If ts_headline had the ability to use that pre-parsed ts_vector, my
problem would never have arisen - and the performance of ts_headline
would be improved.

Maybe. It would still have to parse the text to some degree though, just
to get the original words & punctuation into the headline.

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux