Search Postgresql Archives

Re: ranking how "similar" are tsvectors was: OR tsquery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 17 Jan 2010 20:19:59 +0300 (MSK)
Oleg Bartunov <oleg@xxxxxxxxxx> wrote:

> Ivan,
> 
> You can write function to get lexemes from tsvector:

> CREATE OR REPLACE FUNCTION ts_stat(tsvector, weights text, OUT
> word text, OUT ndoc integer, OUT nentry integer)
> RETURNS SETOF record AS
> $$
>      SELECT ts_stat('SELECT ' || quote_literal( $1::text ) ||
> '::tsvector', quote_literal( $2::text) ); $$ LANGUAGE SQL RETURNS
> NULL ON NULL INPUT IMMUTABLE;

Thanks very much Oleg.

Still it is not really making the pain go away.
I've weights stored in my tsvector and I need to build the query
using them.

This means that if I have:
'aubergine':4A 'orange':1B 'banana':5A 'apple':3C
and
'coconut':3B 'bananas':1A 'tomatoes:2C
stored in a column (tsv) I really would like to build up the query:

to_tsquery('aubergine:A | orange:B | bananas:A | apple:C')

then

tsv
@@
to_tsquery('aubergine:A | orange:B | bananas:A | apple:C')

and relative ts_rank()

I'm aware that it is not symmetrical, but it looks as the cheapest
and fastest thing I can do right now.

I'm using pg_catalog.english. Am I supposing correctly that NO
lexeme will contain spaces?

If that is the case I could simply use string manipulation tools.
Not nice to see but it will work.

> Then, you can create ARRAY like:
> 
> select ARRAY ( select (ts_stat(fts,'*')).word from papers where
> id=2);
> 
> Then, you will have two arrays and you're free to apply any
> similarity function (cosine, jaccard,....) to calculate what do
> you want. If you want to preserve weights, then use weight label
> instead of '*'.

What ts_rank does is more than enough right now.

> Another idea is to use array_agg, but I'm not ready to discuss it.
> 
> Please, keep in mind, that document similarity is a hot topic in

Not hard to imagine.

> IR, and, yes, I and Teodor have something about this, but code
> isn't available for public. Unfortunately, we had no sponsor for
> full-text search for last year and I see no perspectives this
> year, so we postpone our text-search development.

Good luck. Do you have anything like http://www.chipin.com/ for
small donations?

-- 
Ivan Sergio Borgonovo
http://www.webthatworks.it


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux