Searching for the cause of a bad plan

Csaba Nagy <nagy@xxxxxxxxxxxxxx> · Fri, 21 Sep 2007 12:03:44 +0200

Hi all,

Postgres version: 8.2.4

Tables:

table_a(a bigint, b bigint, primary key(a, b) );

table_b1(b bigint primary key, more columns...);

table_b2(b bigint primary key references table_b1(b), more columns...);

table_b1: 
 ~ 27M rows;
 ~25 more columns;
 width=309 (as reported by explain select *);

table_a:
 ~400M rows;
 - column "b" should reference table_b1, but it does not for performance
reasons (it is an insert only table);
 - column "a" distinct values: 1148
 - has (a, b) as primary key;
 - has no additional columns;

table_b1:
 ~40K rows;
 ~70 more columns;
 width=1788 (as reported by explain select *);

Statistics for the involved columns for each table are attached in files
(to preserve the spacing). They were taken after analyzing the relevant
table (except for table_b2 where I added the "fiddled" statistics first
and then remembered to analyze fresh, resulting in the "non_fiddled"
version, which gives the same result as the fiddled one).

The problem query is:

prepare test_001(bigint) as
SELECT tb.*
FROM table_a ta 
JOIN table_b2 tb ON ta.b=tb.b
WHERE ta.a = $1  
ORDER BY ta.a, ta.b
limit 10;

Explain gives Plan 1 (see attached plans.txt)

If I set enable_hashjoin=off and enable_mergejoin=off, I get Plan 2
(again, see plans.txt).

The difference is a 30x improvement in the second case...
(I actually forgot to account for cache effects, but later rerun the
queries multiple times and the timings are proportional).

Additionally, if I replace table_b2 with table_b1 in the query, I get
Plan 3 (with reasonable execution time) with both enable_hashjoin and
enable_mergejoin on. So there is something which makes table_b2
different from table_b1 for planning purposes, but I could not identify
what that is... they have differences in statistics, but fiddling with
the stats gave me no difference in the plan.

Looking at Plan 2, it looks like the "limit" step is estimating wrongly
it's cost. I guessed that it does that because it thinks the "b" values
selected from table_a for a given "a" span a larger range than the "b"
values in table_b2, because the "b" values in table_b2 are a (relatively
small) subset of the "b" values in table_a. But this is not the case,
the query only gets "a" values for which all the "b" values in table_a
will be found in table_b2. Of course the planner has no way to know
this, but then I think it is not the case, as I tried to copy the
histogram statistics in pg_statistic for the column "b" from the entry
for table_b1 (which contains the whole span of "b" values) to the entry
for table_b2, with no change in the plan.

Just for the record, this query is just a part of a more complex one,
which joins in bigger tables, resulting in even worse performance, but I
tracked it down to refusing the nested loop to be the problem.

Is there anything I could do to convince the planner to use here the
nested loop plan ?

Thanks,
Csaba.

 attname | null_frac | avg_width | n_distinct |                                      most_common_vals                                       |                                      most_common_freqs                                      |                                           histogram_bounds                                           | correlation 
---------+-----------+-----------+------------+---------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+-------------
 a       |         0 |         8 |       1148 | {31826743,31855101,31855343,31854918,31856328,31861573,31855122,31855130,31855189,31856426} | {0.005,0.00466667,0.00433333,0.004,0.004,0.004,0.00366667,0.00366667,0.00366667,0.00366667} | {31734956,31854364,31854732,31855162,31855660,31857144,31858109,31858965,31859762,31860576,31861566} |    0.999608 
 b       |         0 |         8 |         -1 |                                                                                             |                                                                                             | {63977,36878147,42247866,42548692,42812320,46992026,51444368,55977972,56607708,59496742,68530614}    |    0.602959 

 attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                                     histogram_bounds                                                                                                                                     | correlation 
---------+-----------+-----------+------------+------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 b       |         0 |         8 |         -1 |                  |                   | {21302091,26264614,32786793,35823704,42390431,43618589,47178753,48479147,49297942,50285891,51142864,52087189,52975318,53873562,54752183,55651893,56643636,57526056,58344883,59143872,60073970,60934277,61858188,62759965,63699252,64585260,65587784,66517871,67368007,68219640,69112491} |    0.227599

 attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                                                                                                                     histogram_bounds                                                                                                                                     | correlation 
---------+-----------+-----------+------------+------------------+-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------
 b       |         0 |         8 |         -1 |                  |                   | {21310803,24281419,32805654,35860260,42573594,45902732,47896008,48922471,49784797,50658309,51556908,52439479,53316630,54186754,55044558,55945438,56836595,57618314,58587628,59478122,60223564,61063339,61995945,62876263,63769015,64717353,65577667,66475226,67420705,68286405,69034532} |     0.99966

Plan 1:

db> explain analyze execute test_001(31855344);
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=333067.77..333067.79 rows=10 width=1804) (actual time=652.739..652.741 rows=2 loops=1)
   ->  Sort  (cost=333067.77..333067.86 rows=36 width=1804) (actual time=652.736..652.736 rows=2 loops=1)
         Sort Key: ta.a, ta.b
         ->  Hash Join  (cost=3364.30..333066.84 rows=36 width=1804) (actual time=648.159..652.709 rows=2 loops=1)
               Hash Cond: (ta.b = tb.b)
               ->  Index Scan using pk_table_a on table_a ta  (cost=0.00..324786.18 rows=388532 width=16) (actual time=454.389..460.138 rows=2 loops=1)
                     Index Cond: (a = $1)
               ->  Hash  (cost=524.58..524.58 rows=41658 width=1788) (actual time=187.192..187.192 rows=41671 loops=1)
                     ->  Seq Scan on table_b2 tb  (cost=0.00..524.58 rows=41658 width=1788) (actual time=5.540..113.387 rows=41671 loops=1)
 Total runtime: 652.980 ms

Plan 2:

db> explain analyze execute test_001(31855344);
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1)
         ->  Index Scan using pk_table_a on table_a ta  (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1)
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b2 on table_b2 tb  (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2)
               Index Cond: (ta.b = tb.b)
 Total runtime: 21.876 ms

Plan 3:

db> explain analyze execute test_001(31855344);
                                                                      QUERY PLAN                                                                      
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..853.14 rows=10 width=325) (actual time=20.117..28.104 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..2024323.48 rows=23728 width=325) (actual time=20.116..28.101 rows=2 loops=1)
         ->  Index Scan using pk_table_a on table_a ta  (cost=0.00..327561.01 rows=388684 width=16) (actual time=0.023..0.027 rows=2 loops=1)
               Index Cond: (a = $1)
         ->  Index Scan using pk_table_b1 on table_b1 tb  (cost=0.00..4.35 rows=1 width=309) (actual time=14.032..14.034 rows=1 loops=2)
               Index Cond: (ta.b = tb.b)
 Total runtime: 28.200 ms

 attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs |                                           histogram_bounds                                           | correlation 
---------+-----------+-----------+------------+------------------+-------------------+------------------------------------------------------------------------------------------------------+-------------
 b       |         0 |         8 |         -1 |                  |                   | {53220192,53236707,53240765,53244670,53248556,53253116,53257333,53261450,53265687,56984462,68866286} |    0.997929

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster