Re: ORDER BY, LIMIT and indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/08/13 22:46, Ivan Voras wrote:
Here are two more unexpected results. Same test table (1 mil. records,
"id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
before the experiments):

ivoras=# explain analyze select * from lt where id > 900000 limit 10;
                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1.71 rows=10 width=9) (actual
time=142.669..142.680 rows=10 loops=1)
    ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
(actual time=142.665..142.672 rows=10 loops=1)
          Filter: (id > 900000)
  Total runtime: 142.735 ms
(4 rows)

Note the Seq Scan.

ivoras=# explain analyze select * from lt where id > 900000;
                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
(actual time=38.462..85.780 rows=100000 loops=1)
    Recheck Cond: (id > 900000)
    ->  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
width=0) (actual time=38.310..38.310 rows=100000 loops=1)
          Index Cond: (id > 900000)
  Total runtime: 115.674 ms
(5 rows)

This somewhat explains the above case - we are simply fetching 100,000
records here, and it's slow enough even with the index scan, so
planner skips the index in the former case. BUT, if it did use the
index, it would have been expectedly fast:

ivoras=# set enable_seqscan to off;
SET
ivoras=# explain analyze select * from lt where id > 900000 limit 10;
                                                          QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
rows=10 loops=1)
    ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
          Index Cond: (id > 900000)
  Total runtime: 0.175 ms
(4 rows)

It looks like the problem is in the difference between what the
planner expects and what the Filter or Index operations deliver:
(cost=0.00..17402.00 rows=101630 width=9) (actual
time=142.665..142.672 rows=10 loops=1).



Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider a similar query on pgbench_accounts:

bench=# explain analyze select aid from pgbench_accounts where aid > 100000 limit 20;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20 loops=1) -> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 width=4) (actual time=0.005..0.463 rows=20 loops=1)
         Filter: (aid > 100000)
 Total runtime: 0.474 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid > 10000000 limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 rows=20 loops=1)
         Index Cond: (aid > 10000000)
 Total runtime: 0.030 ms
(4 rows)


So at some point you get index scans. Now add an ORDER BY:

bench=# explain analyze select aid from pgbench_accounts where aid > 100000 order by aid limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
--
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 rows=20 loops=1
)
         Index Cond: (aid > 100000)
 Total runtime: 0.023 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid > 10000000 order by aid limit 20;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 rows=20 loops=1)
         Index Cond: (aid > 10000000)
 Total runtime: 0.029 ms
(4 rows)


...and we have index scans for both cases.

Cheers

Mark


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux