=?GBK?B?s8LR47fJ?= <postgresql_2016@xxxxxxx> writes: > 1¡¢The first problem is found in PG9.2 for our profuction version. But I test in the latest version. You're still running 9.2 in production? That's ... inadvisable. > 2¡¢The data is simply. The data in b has many distinct number, and the data in C=10 has too many numbers. I've dumped it in test.log in the email attachemt. Thanks for the test data. I don't think there is actually anything wrong here, or at least nothing readily improvable. I get this for your original query: postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b and t2.c = 10 where t1.a = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.85..16.90 rows=7567 width=24) (actual time=0.014..0.015 rows=1 loops=1) -> Index Scan using a_idx on test t1 (cost=0.43..8.45 rows=1 width=12) (actual time=0.008..0.008 rows=1 loops=1) Index Cond: (a = 1) -> Index Scan using b_idx on test t2 (cost=0.43..8.45 rows=1 width=12) (actual time=0.003..0.004 rows=1 loops=1) Index Cond: (b = t1.b) Filter: (c = 10) Planning Time: 0.321 ms Execution Time: 0.032 ms (8 rows) Slightly different from your estimate, but random sampling or a different statistics-target setting would be enough to explain that. It is *not* the t2.c = 10 condition that's resulting in the incorrect estimate, because taking it out doesn't change things much: postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b where t1.a = 1; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.85..16.90 rows=9088 width=24) (actual time=0.014..0.015 rows=1 loops=1) -> Index Scan using a_idx on test t1 (cost=0.43..8.45 rows=1 width=12) (actual time=0.008..0.009 rows=1 loops=1) Index Cond: (a = 1) -> Index Scan using b_idx on test t2 (cost=0.43..8.45 rows=1 width=12) (actual time=0.002..0.003 rows=1 loops=1) Index Cond: (b = t1.b) Planning Time: 0.312 ms Execution Time: 0.031 ms (7 rows) Next, let's take out the t1.a = 1 condition: postgres=# explain analyze select * from test t1 left join test t2 on t1.b = t2.b ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.43..602605.00 rows=10914402957 width=24) (actual time=0.023..1872651.044 rows=10914402000 loops=1) -> Seq Scan on test t1 (cost=0.00..18506.00 rows=1201000 width=12) (actual time=0.013..62.837 rows=1201000 loops=1) -> Index Scan using b_idx on test t2 (cost=0.43..0.48 rows=1 width=12) (actual time=0.001..0.945 rows=9088 loops=1201000) Index Cond: (b = t1.b) Planning Time: 0.297 ms Execution Time: 2062621.438 ms (6 rows) The fundamental join-size estimate, that is the selectivity of t1.b = t2.b, is pretty much dead on here. And note the actual rows=9088 in the inner indexscan. What we see here is that the actual average number of t2 rows joining to a t1 row is 9088. Now the reason for the estimate for the second query becomes clear: the planner doesn't know exactly how many rows join to the specific row with t1.a = 1, but it knows that the average number of joined rows should be 9088, so that's its estimate. In your original query, that is reduced a little bit by the not-very-selective "c = 10" condition, but the big picture is the same. Basically, the row with "a = 1" has an atypical number of join partners and that's why the join size estimate is wrong for this particular query. You might nonetheless complain that the join size estimate should be the product of the two input-scan estimates, but that's not how it works. (Typically those do match up, but with highly skewed data like this the fact that they're derived differently can become obvious.) The size of the b_idx result is based on considering the selectivity of "t2.b = ?" where the comparison value is not known. Because there are so many b values that are unique, we end up estimating the average number of matching rows as 1 even though it will be far higher for a few values. regards, tom lane