Re: Full text search with ORDER BY performance issue

Robert Haas <robertmhaas@xxxxxxxxx> · Wed, 29 Jul 2009 10:02:11 -0400

On Mon, Jul 20, 2009 at 8:12 AM, Oleg Bartunov<oleg@xxxxxxxxxx> wrote:
>> Here's a couple of queries:
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') order by timestamp desc limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=453248.73..453248.79 rows=24 width=281) (actual
>> time=188441.047..188441.148 rows=24 loops=1)
>>  ->  Sort  (cost=453248.73..453882.82 rows=253635 width=281) (actual
>> time=188441.043..188441.079 rows=24 loops=1)
>>        Sort Key: "timestamp"
>>        Sort Method:  top-N heapsort  Memory: 42kB
>>        ->  Bitmap Heap Scan on a  (cost=17782.16..446166.02 rows=253635
>> width=281) (actual time=2198.930..187948.050 rows=256378 loops=1)
>>              Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text))
>>              ->  Bitmap Index Scan on timestamp_comment_gin
>> (cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664
>> rows=259828 loops=1)
>>                    Index Cond: (comment_tsv @@
>> plainto_tsquery('love'::text))
>> Total runtime: 188442.617 ms
>> (9 rows)
>>
>> archive=> explain analyze select * from a where  comment_tsv @@
>> plainto_tsquery('love') limit 24 offset 0;
>>
>> QUERY PLAN
>> ----------
>> Limit  (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647
>> rows=24 loops=1)
>>  ->  Seq Scan on a  (cost=0.00..701071.49 rows=253635 width=281) (actual
>> time=14.629..53.588 rows=24 loops=1)
>>        Filter: (comment_tsv @@ plainto_tsquery('love'::text))
>> Total runtime: 53.731 ms
>> (4 rows)
>>
>> First one runs painfully slow.
>
> Hmm, everything is already written in explain :) In the first query 253635
> rows should be readed from disk and sorted, while in the
> second query only 24 (random) rows readed from disk, so there is 4
> magnitudes
> difference and in the worst case you should expected time for the 1st query
> about 53*10^4 ms.

If love is an uncommon word, there's no help for queries of this type
being slow unless the GIN index can return the results in order.  But
if love is a common word, then it would be faster to do an index scan
by timestamp on the baserel and then treat comment_tsv @@
plainto_tsquery('love') as a filter condition.  Is this a selectivity
estimation bug?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance