Re: Searching for the cause of a bad plan

Simon Riggs <simon@xxxxxxxxxxxxxxx> · Fri, 21 Sep 2007 14:36:17 +0100

On Fri, 2007-09-21 at 14:12 +0200, Csaba Nagy wrote:
> On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote:
> > On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote:
> > 
> > > > Can you plans with/without LIMIT and with/without cursor, for both b1
> > > > and b2?
> > > 
> > > The limit is unfortunately absolutely needed part of the query
> > 
> > Understood, but not why I asked...
> > 
> Well, the same query without limit goes:

OK, thanks.

> I'm not sure what you mean without cursor, maybe not using prepare ?

Sorry, misread that.

=======================

I think I understand now: The cost of the LIMIT is being applied, but in
slightly the wrong way. The cost of the Nested Loop node is reduced by
the fraction of LIMIT/(number of expected rows), which is only an
approximation of what we're doing. In Plan 2 this leads to the wildly
wrong estimate that each row costs 49,851 cost units to retrieve, which
is about x50 wrong. In Plan 3 that approximation leads to a more
reasonable cost, so this works in Plan 3, but doesn't in Plan 2. 

What we should do is push down the effect of the LIMIT so that the cost
of the Index Scan on ta reflects the fact that it returns only 10 rows.
It correctly expects 388638 rows that match the value requested, but it
is not retrieving all of them. The executor handles the query
efficiently but the cost model doesn't reflect what the executor
actually does and so we pick the wrong plan. Pushing down the LIMIT
would only be possible when LIMIT has a constant value at plan time, but
that seems like most of the time to my eyes.

The plan estimates should look like this for Plan 2 (marked **)

 Limit  (cost=0.00..XXXX rows=10 width=1804)
   ->  Nested Loop  (cost=0.00..XXXXX rows=10 width=1804)
         ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..**11.96** rows=**10** width=16)
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b2 on table_b2 tb
(cost=0.00..3.77 rows=1 width=1788)
               Index Cond: (ta.b = tb.b)

Incidentally, the way out of this is to improve the stats by setting
stats target = 1000 on column a of ta. That will allow the optimizer to
have a better estimate of the tail of the distribution of a, which
should then be more sensibly reflected in the cost of the Index Scan.
That doesn't solve the actual problem, but should help in your case.

Plans copied below for better clarity:

Plan 2:

db> explain analyze execute test_001(31855344);

QUERY
PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual
time=17.729..21.672 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual
time=17.729..21.671 rows=2 loops=1)
         ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198
rows=2 loops=1)
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b2 on table_b2 tb
(cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1
loops=2)
               Index Cond: (ta.b = tb.b)
 Total runtime: 21.876 ms

Plan 3:

db> explain analyze execute test_001(31855344);

QUERY
PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..853.14 rows=10 width=325) (actual
time=20.117..28.104 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..2024323.48 rows=23728 width=325) (actual
time=20.116..28.101 rows=2 loops=1)
         ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..327561.01 rows=388684 width=16) (actual time=0.023..0.027
rows=2 loops=1)
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b1 on table_b1 tb
(cost=0.00..4.35 rows=1 width=309) (actual time=14.032..14.034 rows=1
loops=2)
               Index Cond: (ta.b = tb.b)
 Total runtime: 28.200 ms

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match