On Wed, Jul 13, 2011 at 17:36, Oleg Bartunov <oleg@xxxxxxxxxx> wrote: > I didn't notice, reading 40K tuples in random order takes a long time and > this > is a problem of any database. Can you measure time to read all documents > found ? As you asked, I measured the time required to read all documents. For reference, after having dropped the operating system cache, my machine can read a 1 GB file in 20 seconds, that is 50 MB / second. Here are the stats for table posts_1000000: Table size: 117 MB TOAST table size: 8356 MB Index size: 1720 MB I forced PostgreSQL to read all documents using the following query, which doesn't involve ts_rank: explain analyze select sum(length(document_vector)) from posts_1000000; Aggregate (cost=27472.52..27472.53 rows=1 width=18) (actual time=346952.556..346952.557 rows=1 loops=1) -> Seq Scan on posts_1000000 (cost=0.00..24975.01 rows=999001 width=18) (actual time=0.023..1793.523 rows=999001 loops=1) Total runtime: 346952.595 ms Then I ran a similar query that involves ts_rank: explain analyze select sum(ts_rank_cd(document_vector, to_tsquery('english', 'crare'), 32)) from posts_1000000 Aggregate (cost=27472.52..27472.53 rows=1 width=18) (actual time=373713.957..373713.958 rows=1 loops=1) -> Seq Scan on posts_1000000 (cost=0.00..24975.01 rows=999001 width=18) (actual time=20.045..1847.897 rows=999001 loops=1) Total runtime: 373714.031 ms The first query ran in 347 seconds; the second one in 374 seconds. Conclusion: There is no significant overhead in the ts_rank function itself. It's slow because ts_rank has to read in random order 40 000 ts_vector stored in TOAST table. The slow execution time looks like a direct consequence of storing ts_vector in TOAST table... > :( The only solution I see is to store enough information for ranking in index. Is it the expected behavior? How can I improve that? Thanks, Nicolas -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general