On Tue, Aug 6, 2013 at 8:03 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote: > On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood > <mark.kirkwood@xxxxxxxxxxxxxxx> wrote: >> Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider >> a similar query on pgbench_accounts: >> >> bench=# explain analyze select aid from pgbench_accounts where aid > 100000 >> limit 20; >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20 >> loops=1) >> -> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 >> width=4) (actual time=0.005..0.463 rows=20 loops=1) >> Filter: (aid > 100000) >> Total runtime: 0.474 ms >> (4 rows) >> >> bench=# explain analyze select aid from pgbench_accounts where aid > >> 10000000 limit 20; >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 >> rows=20 loops=1) >> Index Cond: (aid > 10000000) >> Total runtime: 0.030 ms >> (4 rows) >> >> >> So at some point you get index scans. Now add an ORDER BY: >> >> bench=# explain analyze select aid from pgbench_accounts where aid > 100000 >> order by aid limit 20; >> QUERY PLAN >> >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> -- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 >> rows=20 loops=1 >> ) >> Index Cond: (aid > 100000) >> Total runtime: 0.023 ms >> (4 rows) >> >> bench=# explain analyze select aid from pgbench_accounts where aid > >> 10000000 order by aid limit 20; >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 >> rows=20 loops=1) >> Index Cond: (aid > 10000000) >> Total runtime: 0.029 ms >> (4 rows) >> >> >> ...and we have index scans for both cases. >> >> Cheers >> >> Mark > > Yes, but those index scans decisions are driven by the wrong factor. > In the last two cases, the need for rows to be ordered. In the second > case, the estimated number of tuples in the scan. > > In both cases, that's not the driving factor for the right decision. > The driving factor *should* be startup cost, which is nonzero because > there is a filter being applied to that sequential scan that filters > many of the initial tuples. With a nonzero startup cost, the cost of > the limited plan would be "startup cost + scan cost * scanned > fraction". When scanned fraction is low enough, startup cost dominates > the equation. > > With a min/max index, a cheap query to that index could estimate at > least a lower bound to that initial zero-rows-output cost. With b-tree > indexes, not so much (the b-tree would have to be traversed until the > first filter-passing tuple, which could be a while). Ok, silly me. A min/max index would *elliminate* the startup cost. So, what's a good way to estimate it? Perhaps value-page correlation could be used, but it would cover a really narrow case (monotonous sequences). Alternatively, a token startup cost could be added to those kinds of filtered sequential scans, when the filtering term is selective enough. That would offset the cost just a little bit, but enough to favor index over sequential on the right cases. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance