On Mon, Oct 18, 2010 at 9:40 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote: > 8.4.5 > > I consistently see HashJoin plans that hash the large table, and scan the small table. This is especially puzzling in some cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and scan the big one? > > Here is one case I saw just recently > > Hash Cond: ((a.e_id)::text = (ta.name)::text) > -> Index Scan using c_a_s_e_id on a (cost=0.00..8.21 rows=14 width=27) > Index Cond: (id = 12) > -> Hash (cost=89126.79..89126.79 rows=4825695 width=74) > -> Seq Scan on p_a_1287446030 tmp (cost=0.00..89126.79 rows=4825695 width=74) > Filter: (id = 12) Can we have the complex EXPLAIN output here, please? And the query? For example, this would be perfectly sensible if the previous line started with "Hash Semi Join" or "Hash Anti Join". rhaas=# explain select * from little where exists (select * from big where big.a = little.a); QUERY PLAN ----------------------------------------------------------------------- Hash Semi Join (cost=3084.00..3478.30 rows=10 width=4) Hash Cond: (little.a = big.a) -> Seq Scan on little (cost=0.00..1.10 rows=10 width=4) -> Hash (cost=1443.00..1443.00 rows=100000 width=4) -> Seq Scan on big (cost=0.00..1443.00 rows=100000 width=4) (5 rows) I'm also a bit suspicious of the fact that the hash condition has a cast to text on both sides, which implies, to me anyway, that the underlying data types are not text. That might mean that the query planner doesn't have very good statistics, which might mean that the join selectivity estimates are wackadoo, which can apparently cause this problem: rhaas=# explain select * from little, big where little.a = big.a; QUERY PLAN ----------------------------------------------------------------------- Hash Join (cost=3084.00..3577.00 rows=2400 width=8) Hash Cond: (little.a = big.a) -> Seq Scan on little (cost=0.00..34.00 rows=2400 width=4) -> Hash (cost=1443.00..1443.00 rows=100000 width=4) -> Seq Scan on big (cost=0.00..1443.00 rows=100000 width=4) (5 rows) rhaas=# analyze; ANALYZE rhaas=# explain select * from little, big where little.a = big.a; QUERY PLAN ------------------------------------------------------------------- Hash Join (cost=1.23..1819.32 rows=10 width=8) Hash Cond: (big.a = little.a) -> Seq Scan on big (cost=0.00..1443.00 rows=100000 width=4) -> Hash (cost=1.10..1.10 rows=10 width=4) -> Seq Scan on little (cost=0.00..1.10 rows=10 width=4) (5 rows) This doesn't appear to make a lot of sense, but... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance