exposing more parse was: Re: tsearch2: setting weights on tsquery

Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> · Tue, 21 Oct 2008 13:20:12 +0200

On Tue, 21 Oct 2008 10:29:52 +0200
Ivan Sergio Borgonovo <mail@xxxxxxxxxxxxxxx> wrote:

I came across this:
http://grokbase.com/topic/2007/08/07/general-tsearch2-plainto-tsquery-with-or/r92nI5l_k9S4iKcWdCxKs05yFQk

And I find it is strictly related to my needs.
Working around ts_parse I could get an array of tokens, filter on
the one that are not considered significative and get an array
similar to: strip(to_tsvector(...))
Since now I've an array I could then loop on it and build up a
ts_query according to my needs.

While Mike Rylander's solution may work the regexp is not a good
enough substitute for tsearch2 parser.

I didn't have the time to look at the source code (and actually I'd
feel a bit pity to download some Mb to just look at a function) but
I bet all the code spend a lot of time to turn strings into
tsvectors.
If there were more tools to manipulate and turn tsvectors into
tsquery we could build more interesting search functions in plpsql
et al. and maybe cache some of the conversions (string -> tsvector)
inspite of spending time serializing an de-serializing stuff.

But here is the more pragmatical question:
### LOOK HERE FIRST ;) ####
Or am I missing something of the already available
functions/operations?

> plainto_tsquery is handy to make a string from users turn into a
> tsquery.
> 
> This strips "control" characters and glue lexemes with &.

> Now I've several strings coming from input user and what I'd like
> to do is assign a different token to each part.
> 
> eg.
> input1 = "ratto && matto | gatto & the"
> input2 = "sasso|&passo lasso a"
> ->
> tsquery = 'ratto:A & matto:A & gatto:A & sasso:B & passo:B &
> lasso:B'
> 
> I could prepare the input outside postgresql and then
> "concatenate" the queries but plainto_tsquery is very comfortable
> since it will always actually "clean up" the right way the input
> and adds & at the right place.
> 
> Otherwise I could use more than one tsvector for searching input1
> and input2 but it seems it is slower than
> update t1 set FT1IDX=(
> 
> setweight(to_tsvector('pg_catalog.english',
> coalesce(input1,'')), 'A') || ' ' ||
> 
> setweight(to_tsvector('pg_catalog.english',
> coalesce(input2,'')), 'B')
> )
> 
> and I won't be able rank on all fields at a time.
> 

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general