Re: ORDER BY, LIMIT and indexes

Claudio Freire <klaussfreire@xxxxxxxxx> · Tue, 6 Aug 2013 20:09:57 -0300

On Tue, Aug 6, 2013 at 8:03 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:
> On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood
> <mark.kirkwood@xxxxxxxxxxxxxxx> wrote:
>> Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider
>> a similar query on pgbench_accounts:
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
>> limit 20;
>> QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20
>> loops=1)
>>    ->  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
>> width=4) (actual time=0.005..0.463 rows=20 loops=1)
>>          Filter: (aid > 100000)
>>  Total runtime: 0.474 ms
>> (4 rows)
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid >
>> 10000000 limit 20;
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
>> rows=20 loops=1)
>>          Index Cond: (aid > 10000000)
>>  Total runtime: 0.030 ms
>> (4 rows)
>>
>>
>> So at some point you get index scans. Now add an ORDER BY:
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
>> order by aid limit 20;
>> QUERY PLAN
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>> --
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
>> rows=20 loops=1
>> )
>>          Index Cond: (aid > 100000)
>>  Total runtime: 0.023 ms
>> (4 rows)
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid >
>> 10000000 order by aid limit 20;
>> QUERY PLAN
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
>> rows=20 loops=1)
>>          Index Cond: (aid > 10000000)
>>  Total runtime: 0.029 ms
>> (4 rows)
>>
>>
>> ...and we have index scans for both cases.
>>
>> Cheers
>>
>> Mark
>
> Yes, but those index scans decisions are driven by the wrong factor.
> In the last two cases, the need for rows to be ordered. In the second
> case, the estimated number of tuples in the scan.
>
> In both cases, that's not the driving factor for the right decision.
> The driving factor *should* be startup cost, which is nonzero because
> there is a filter being applied to that sequential scan that filters
> many of the initial tuples. With a nonzero startup cost, the cost of
> the limited plan would be "startup cost + scan cost * scanned
> fraction". When scanned fraction is low enough, startup cost dominates
> the equation.
>
> With a min/max index, a cheap query to that index could estimate at
> least a lower bound to that initial zero-rows-output cost. With b-tree
> indexes, not so much (the b-tree would have to be traversed until the
> first filter-passing tuple, which could be a while).

Ok, silly me. A min/max index would *elliminate* the startup cost.

So, what's a good way to estimate it?

Perhaps value-page correlation could be used, but it would cover a
really narrow case (monotonous sequences).

Alternatively, a token startup cost could be added to those kinds of
filtered sequential scans, when the filtering term is selective
enough. That would offset the cost just a little bit, but enough to
favor index over sequential on the right cases.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance