Re: strange query plan with LIMIT

anthony.shipman@xxxxxxxxxxxxx · Wed, 8 Jun 2011 15:08:05 +1000

On Wednesday 08 June 2011 02:40, tv@xxxxxxxx wrote:
> Hi,
>
> why are you reposting this? Pavel Stehule already recommended you to run
> ANALYZE on the tdiag table - have you done that? What was the effect?

The mailing list system hiccupped and I ended up with two posts.

VACUUM ANALYZE was done, more than once.
Setting the statistics value on the diag_id column to 1000 seemed to only make 
the query a bit slower.

>
> The stats are off - e.g. the bitmap scan says
>
> Â Â-> ÂBitmap Heap Scan on tdiag Â(cost=25763.48..638085.13 rows=1141019
> width=114) (actual time=43.232..322.441 rows=86530 loops=1)
>
> so it expects to get 1141019 rows but it gets 86530, i.e. about 7% of the
> expected number. That might be enough to cause bad plan choice and thus
> performance issues.

What seems odd to me is that the only difference between the two is the limit 
clause:

select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by 
diag_id limit 1;

select * from tdiag where (create_time >= '2011-06-03
09:49:04.000000+0' and create_time < '2011-06-06 09:59:04.000000+0') order by 
diag_id;

and yet the plan completely changes.

I think that I have to force the evaluation order to get a reliably fast 
result:

begin; create temporary table tt on commit drop as
select diag_id from tdiag where create_time >= '2011-06-03 09:49:04.000000+0' 
     and create_time < '2011-06-06 09:59:04.000000+0';
select * from tdiag where diag_id in (select * from tt)
     order by diag_id limit 10; commit;

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3566.24..3566.27 rows=10 width=112) (actual 
time=1800.699..1800.736 rows=10 loops=1)
   ->  Sort  (cost=3566.24..3566.74 rows=200 width=112) (actual 
time=1800.694..1800.708 rows=10 loops=1)
         Sort Key: tdiag.diag_id
         Sort Method:  top-N heapsort  Memory: 18kB
         ->  Nested Loop  (cost=1360.00..3561.92 rows=200 width=112) (actual 
time=269.087..1608.324 rows=86530 loops=1)
               ->  HashAggregate  (cost=1360.00..1362.00 rows=200 width=4) 
(actual time=269.052..416.898 rows=86530 loops=1)
                     ->  Seq Scan on tt  (cost=0.00..1156.00 rows=81600 
width=4) (actual time=0.020..120.323 rows=86530 loops=1)
               ->  Index Scan using tdiag_pkey on tdiag  (cost=0.00..10.99 
rows=1 width=112) (actual time=0.006..0.008 rows=1 loops=86530)
                     Index Cond: (tdiag.diag_id = tt.diag_id)
 Total runtime: 1801.290 ms

>
> And yet another recommendation - the sort is performed on disk, so give it
> more work_mem and it should be much faster (should change from "merge
> sort" to "quick sort"). Try something like work_mem=20MB and see if it
> does the trick.

This certainly speeds up the sorting.

>
> regards
> Tomas

-- 
Anthony Shipman                 | What most people think about
Anthony.Shipman@xxxxxxxxxxxxx   | most things is mostly wrong.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance