Re: Cost of sort/order by not estimated by the query planner

Robert Haas <robertmhaas@xxxxxxxxx> · Wed, 2 Dec 2009 11:47:44 -0500

On Wed, Dec 2, 2009 at 10:32 AM, Laurent Laborde <kerdezixe@xxxxxxxxx> wrote:
> * without order by, limit 5 : 70ms
> ----------------------------------
>  explain analyze SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> LIMIT 5;
>
> QUERY PLAN  :
> Limit  (cost=0.00..20.03 rows=5 width=1109) (actual
> time=70.190..70.265 rows=5 loops=1)
>   ->  Index Scan using idx_article_bitfield on _article
> (cost=0.00..69290.99 rows=17298 width=1109) (actual
> time=70.188..70.260 rows=5 loops=1)
>         Index Cond: (bitfield && B'1'::bit varying)
>  Total runtime: 70.406 ms
> (4 rows)
>
> * without order by, limit 500 (same plan as above) : 371ms
> ------------------------------------------------------------------
> explain analyze SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> LIMIT 500;
>
> QUERY PLAN:
>  Limit  (cost=0.00..2002.86 rows=500 width=1109) (actual
> time=0.087..371.257 rows=500 loops=1)
>   ->  Index Scan using idx_article_bitfield on _article
> (cost=0.00..69290.99 rows=17298 width=1109) (actual
> time=0.086..371.075 rows=500 loops=1)
>         Index Cond: (bitfield && B'1'::bit varying)
>  Total runtime: 371.369 ms
>
> * without order by, limit 5000 (query plan changed) : 1307ms
> -------------------------------------------------------------------
>  explain analyze SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> LIMIT 5000;
>
> QUERY PLAN :
>  Limit  (cost=138.34..18971.86 rows=5000 width=1109) (actual
> time=53.782..1307.173 rows=5000 loops=1)
>   ->  Bitmap Heap Scan on _article  (cost=138.34..65294.79 rows=17298
> width=1109) (actual time=53.781..1305.565 rows=5000 loops=1)
>         Recheck Cond: (bitfield && B'1'::bit varying)
>         ->  Bitmap Index Scan on idx_article_bitfield
> (cost=0.00..134.01 rows=17298 width=0) (actual time=53.606..53.606
> rows=6743 loops=1)
>               Index Cond: (bitfield && B'1'::bit varying)
>  Total runtime: 1307.972 ms
>
>
> So... *without* "order by", differents limit and different query plan
> : the queries are fast.
>
> * with order by, limit 5 :
> ------------------------------
> explain analyze SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 5;
>
> QUERY PLAN :
> Mmmm.... the query is running since 2h ... waiting, waiting.
>
>
> * with order by, limit 500 : 546ms
> -------------------------------
> explain analyze SELECT *
> FROM   _article
> WHERE (_article.bitfield && getbit(0))
> ORDER BY _article.id ASC
> LIMIT 500;
> QUERY PLAN :
>  Limit  (cost=66156.73..66157.98 rows=500 width=1109) (actual
> time=545.671..545.900 rows=500 loops=1)
>   ->  Sort  (cost=66156.73..66199.98 rows=17298 width=1109) (actual
> time=545.670..545.766 rows=500 loops=1)
>         Sort Key: id
>         Sort Method:  top-N heapsort  Memory: 603kB
>         ->  Bitmap Heap Scan on _article  (cost=138.34..65294.79
> rows=17298 width=1109) (actual time=1.059..541.359 rows=6729 loops=1)
>               Recheck Cond: (bitfield && B'1'::bit varying)
>               ->  Bitmap Index Scan on idx_article_bitfield
> (cost=0.00..134.01 rows=17298 width=0) (actual time=0.922..0.922
> rows=6743 loops=1)
>                     Index Cond: (bitfield && B'1'::bit varying)
>  Total runtime: 546.163 ms
>
>
> Now... with ordery by, different limit, different query plan, the
> limit 5 query is insanly *SLOW* (while the limit 500 is super fast).
>
> What is think : The query planner do not consider the time taken by
> the order by... which is *much* slower !!

That is certainly not the case.  If the query planner did not consider
the time required to perform a sort, well, that would have been fixed
a lot sooner than now.  The problem real problem here is exactly what
I said upthread.   Without order-by, the query planner picks an
index-scan or a bitmap-index-scan and just runs it until it gets
enough rows to satisfy the LIMIT.  No problem.  With order-by, it has
to make a decision: should it fetch ALL the rows that satisfy the
bitfield condition, sort them by article ID, and then pick the top
five?  Or should it instead use the index on article ID to start
retrieving the lowest-numbered article IDs and hope to find 5 that
satisfy the bitfield condition before it goes through too many rows?

The answer depends on how frequently the bitfield condition will be
satisfied.  If most rows in the table satisfy the bitfield condition,
then the second plan is better; if very few do, the first plan is
better.  Somewhat more subtly, the plan also depends on the LIMIT.
The first plan requires almost the same amount of work for a small
limit as it does for a large one - you still have to find ALL the rows
that match the bitfield condition and sort them.  Then you return a
larger or smaller number of rows from the result of the sort depending
on the LIMIT.  But the amount of work that the second plan requires
varies dramatically depending on the LIMIT.  If the LIMIT is only
one-hundredth as large (5 instead of 500), then the second plan
figures to have to scan only one one-hundredth as many rows, so it
takes about a hundredth as much work for LIMIT 5 rather than LIMIT
500, whereas the cost of the first plan hasn't changed much.

The exact break-even point between the two plans will vary depending
on what percentage of the rows in the table satisfy the bitmap
condition.  In your particular case, the break-even point is less than
one row, so the first plan is always better, but the planner doesn't
know that.  I don't think the planner has any statistics that can tell
it how many of the rows in the table satisfy (_article.bitfield &&
getbit(0)), and it's probably estimating high because the actual
selectivity based on the numbers you provided is quite low.  That
makes the second plan look appealing for small numbers of rows.  If
the rows that it needs are clustered at the "wrong end" of the
article-ID index, which the planner certainly has no way of knowing,
then things get really ugly.

I've sometimes thought that the planner should outright discard plans
whose total cost (ignoring the effect of LIMIT) is too many orders of
magnitude more than some other available plan.  But it's hard to know
where to put the cutoff, and there are cases where the planner makes
this kind of trade-off and gets it right which we don't want to break,
so it's not a simple problem.  The best solution I've found is to
avoid using expressions that depend on operators other than equality
whenever possible.  The planner is very smart about equality.  It's
somewhat smart about >, <, >=, <=, <>, and pretty stupid about most
other things.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance