On Fri, Mar 6, 2015 at 5:05 PM, Patrick Dung <patrick_dkt@xxxxxxxxxxxx> wrote: > Resend. > > How to quickly compare the similarity of two tsvector? > check http://www.sai.msu.su/~megera/postgres/talks/pgcon-2012.pdf > > On Monday, March 2, 2015 11:01 PM, Patrick Dung <patrick_dkt@xxxxxxxxxxxx> > wrote: > > > Hello, > > I had a database with articles or attachment stored in bytea format. > I also had a trigger: it insert/update the tsv column when a record is > added/updated. > The tsv column had a GIN index. > With this setting, I can do very fast keyword search on the tsv. > > Suppose I had a specific record (id=100000). > How to list similar records based on ranking? > In that case, I had to compare a tsvector with another tsvector. > > I had this SQL which make the original tsv as a text and then to tsquery, > Then I can compare a tsv and a tsquery. > SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) > as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, > (SELECT tsv, id FROM items WHERE id=100000) AS original WHERE i.id != > original.id ORDER BY similarity; > > items table: > id bigint > company varchar > industry varchar > description varchar > post_timestamp timestamp > attachment bytea > tsv tsvector > > The problem is that this is very slow. > Any comment? > > Thank and regards, > Patrick > > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general