Re: ORDER BY, LIMIT and indexes

Ivan Voras <ivoras@xxxxxxxxxxx> · Tue, 6 Aug 2013 12:04:13 +0200

On 6 August 2013 02:20, Michael Paquier <michael.paquier@xxxxxxxxx> wrote:
>
> On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire <klaussfreire@xxxxxxxxx>
> wrote:
>>
>> On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@xxxxxxxxxxx> wrote:
>> > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>> >
>> > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10
>>
>> Did you try explain?
>
> And did you run ANALYZE on your table to be sure that you generate correct
> plans?

My question was a theoretical one - about general pgsql abilities, not
a specific case.

But after prodded by you, here's an EXPLAIN ANALYZE for the first case:

ivoras=# explain analyze select * from lt order by id desc limit 10;
                                                              QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1)
 Total runtime: 0.115 ms
(3 rows)

(This test table has only 1 mil. records.)

I'm not sure how to interpret the last line:
(cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042
rows=10 loops=1)

It seems to me like the planner thinks the Index Scan operation will
return the entire table, and expects both a huge cost and all of the
records, but its actual implementation does return only 10 records.
The Limit operation is a NOP in this case. Is this correct?

In the second case (with OFFSET):
ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10;
                                                              QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=1000000 width=9) (actual time=0.040..0.061 rows=20 loops=1)
 Total runtime: 0.136 ms
(3 rows)

It looks like the Index Scan implementation actually returns the first
20 records - which means that for the last "page" of the supposed
paginated display, pgsql will actually internally read the entire
table as an operation result in memory and discard almost all of it in
the Limit operation. Right?

And for the third case (with another column in WITH):
ivoras=# update lt set active = false where (id % 2) = 0;
UPDATE 500000
ivoras=# explain analyze select * from lt where active order by id
desc limit 10;
                                                              QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..47380.43
rows=1500000 width=9) (actual time=0.132..0.174 rows=10 loops=1)
         Filter: active
 Total runtime: 0.244 ms
(4 rows)

It looks like filtering is performed in the Index Scan operation -
which I never expected. Is this usually done or only for simple
queries. In other words, is there a circumstance where it is NOT done
in this way, and should I care?

Based on the results with OFFSET it looks like it would be a bad idea
to use pgsql for allowing the user to browse through the records in a
paginated form if the table is huge. I would very much like to be
proven wrong :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance