On Fri, 2007-09-21 at 14:12 +0200, Csaba Nagy wrote: > On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote: > > On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote: > > > > > > Can you plans with/without LIMIT and with/without cursor, for both b1 > > > > and b2? > > > > > > The limit is unfortunately absolutely needed part of the query > > > > Understood, but not why I asked... > > > Well, the same query without limit goes: OK, thanks. > I'm not sure what you mean without cursor, maybe not using prepare ? Sorry, misread that. ======================= I think I understand now: The cost of the LIMIT is being applied, but in slightly the wrong way. The cost of the Nested Loop node is reduced by the fraction of LIMIT/(number of expected rows), which is only an approximation of what we're doing. In Plan 2 this leads to the wildly wrong estimate that each row costs 49,851 cost units to retrieve, which is about x50 wrong. In Plan 3 that approximation leads to a more reasonable cost, so this works in Plan 3, but doesn't in Plan 2. What we should do is push down the effect of the LIMIT so that the cost of the Index Scan on ta reflects the fact that it returns only 10 rows. It correctly expects 388638 rows that match the value requested, but it is not retrieving all of them. The executor handles the query efficiently but the cost model doesn't reflect what the executor actually does and so we pick the wrong plan. Pushing down the LIMIT would only be possible when LIMIT has a constant value at plan time, but that seems like most of the time to my eyes. The plan estimates should look like this for Plan 2 (marked **) Limit (cost=0.00..XXXX rows=10 width=1804) -> Nested Loop (cost=0.00..XXXXX rows=10 width=1804) -> Index Scan using pk_table_a on table_a ta (cost=0.00..**11.96** rows=**10** width=16) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) Index Cond: (ta.b = tb.b) Incidentally, the way out of this is to improve the stats by setting stats target = 1000 on column a of ta. That will allow the optimizer to have a better estimate of the tail of the distribution of a, which should then be more sensibly reflected in the cost of the Index Scan. That doesn't solve the actual problem, but should help in your case. Plans copied below for better clarity: Plan 2: db> explain analyze execute test_001(31855344); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1) -> Nested Loop (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 21.876 ms Plan 3: db> explain analyze execute test_001(31855344); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..853.14 rows=10 width=325) (actual time=20.117..28.104 rows=2 loops=1) -> Nested Loop (cost=0.00..2024323.48 rows=23728 width=325) (actual time=20.116..28.101 rows=2 loops=1) -> Index Scan using pk_table_a on table_a ta (cost=0.00..327561.01 rows=388684 width=16) (actual time=0.023..0.027 rows=2 loops=1) Index Cond: (a = $1) -> Index Scan using pk_table_b1 on table_b1 tb (cost=0.00..4.35 rows=1 width=309) (actual time=14.032..14.034 rows=1 loops=2) Index Cond: (ta.b = tb.b) Total runtime: 28.200 ms -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match