On Thu, 27 Jan 2005, Max wrote:
Hi,
I'm setting up a simple search engine using Tsearch2.
The basic idea is: a user enters a search query and a maximum of 1000 results is returned, sorted by date, newest first.
At the moment the table holding the searchable data has 1.1 million entries.
It works great when the search only produces a few hundred results. However when people search on a common word with 10.000+ results, there's a performance problem.
The database design looks like this:
CREATE TABLE posts_index ( .... startdate INT NOT NULL, idxFTI tsvector, .... );
Where startdate is a unix timestamp, and idxFTI is a tsvector with the data to be searched.
Since only 1000 results need to be returned sorted by date (newest first), I hoped to solve the problem by installing the btree_gist extension and adding the following index:
CREATE INDEX idxFTI_idx2 ON posts_index USING gist(idxFTI,(-startdate));
However the -startdate portion of the index doesn't seem to be used:
------
EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM posts_index i, to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY (-i.startdate) LIMIT 1000;
I assume you already vacuum your db. Hmm, seems you need to rewrite your query.
EXPLAIN SELECT startdate, headline(subject,q) AS subject FROM ( SELECT startdate, subject from posts_index i,
to_tsquery('default', '_SEARCH_TERM_') AS q WHERE idxfti @@ q ORDER BY
(-i.startdate) LIMIT 1000) as foo;
I bet your query will be much faster. In your query all founded tuples should be read from disk to calculate headline(), while in my query maximum 1000 tuples will be read. So, performance gain could be noticeable, for example, if search returns 10,000 tuples, my query will be 10x faster than yours :) I think this is what you observed.
QUERY PLAN
Limit (cost=5152014.10..5152016.60 rows=1000 width=126)
-> Sort (cost=5152014.10..5155079.61 rows=1226201 width=126)
Sort Key: (- i.startdate)
-> Nested Loop (cost=0.00..4912754.84 rows=1226201 width=126)
-> Function Scan on q (cost=0.00..12.50 rows=1000 width=32)
-> Index Scan using idxfti_idx2 on posts_index i (cost=0.00..4891.27 rows=1227 width=253)
Index Cond: (i.idxfti @@ "outer".q)
----
Any suggestions?
Regards,
Max
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)
Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend