No hash join across partitioned tables?

Kris Jurka <books@xxxxxxxxxx> · Thu, 16 Apr 2009 19:09:37 -0400 (EDT)

PG (8.3.7) doesn't seem to want to do a hash join across two partitioned 
tables.  I have two partition hierarchies: impounds (with different 
impound sources) and liens (with vehicle liens from different companies). 
Trying to match those up gives:

EXPLAIN SELECT COUNT(*)
FROM impounds i
	JOIN liens l ON (i.vin = l.vin);

 Aggregate  (cost=11164042.66..11164042.67 rows=1 width=0)
   ->  Nested Loop  (cost=0.27..3420012.94 rows=3097611886 width=0)
         Join Filter: ((i.vin)::text = (l.vin)::text)
         ->  Append  (cost=0.00..1072.77 rows=33577 width=21)
               ->  Seq Scan on impounds i  (cost=0.00..11.40 rows=140 width=21)
               ->  Seq Scan on impounds_s1 i  (cost=0.00..926.87 rows=29587 width=18)
               ->  Seq Scan on impounds_s2 i  (cost=0.00..99.96 rows=3296 width=18)
               ->  Seq Scan on impounds_s3 i  (cost=0.00..23.14 rows=414 width=18)
               ->  Seq Scan on impounds_s4 i  (cost=0.00..11.40 rows=140 width=21)
         ->  Append  (cost=0.27..101.64 rows=15 width=21)
               ->  Bitmap Heap Scan on liens l  (cost=0.27..5.60 rows=2 width=21)
                     Recheck Cond: ((l.vin)::text = (i.vin)::text)
                     ->  Bitmap Index Scan on liens_pk  (cost=0.00..0.27 rows=2 width=0)
                           Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using liens_s1_pk on liens_s1 l  (cost=0.00..7.02 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using liens_s2_pk on liens_s2 l  (cost=0.00..3.47 rows=1 width=21)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s3_pk on liens_s3 l  (cost=0.00..7.52 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s4_pk on liens_s4 l  (cost=0.00..7.67 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s5_pk on liens_s5 l  (cost=0.00..7.62 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s6_pk on liens_s6 l  (cost=0.00..7.61 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s7_pk on liens_s7 l  (cost=0.00..7.50 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s8_pk on liens_s8 l  (cost=0.00..7.36 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s9_pk on liens_s9 l  (cost=0.00..7.43 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s10_pk on liens_s10 l  (cost=0.00..7.79 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s11_pk on liens_s11 l  (cost=0.00..8.07 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s12_pk on liens_s12 l  (cost=0.00..8.45 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)
               ->  Index Scan using newliens_s13_pk on liens_s13 l  (cost=0.00..8.53 rows=1 width=18)
                     Index Cond: ((l.vin)::text = (i.vin)::text)

This takes quite a while as it's got to do tons of index probes which 
results it tons of random IO.  I killed this after five minutes of 
running.

But if I do:

CREATE TABLE i1 AS SELECT * FROM impounds;
CREATE TABLE l1 AS SELECT * FROM liens;

I get a reasonable plan, which runs in about 15 seconds, from:

EXPLAIN SELECT COUNT(*)
FROM i1 i
        JOIN l1 l ON (i.vin = l.vin);

 Aggregate  (cost=749054.78..749054.79 rows=1 width=0)
   ->  Hash Join  (cost=1444.18..748971.43 rows=33338 width=0)
         Hash Cond: ((l.vin)::text = (i.vin)::text)
         ->  Seq Scan on l1 l  (cost=0.00..332068.96 rows=18449996 
width=18)
         ->  Hash  (cost=1027.97..1027.97 rows=33297 width=18)
               ->  Seq Scan on i1 i  (cost=0.00..1027.97 rows=33297 
width=18)

I've tried to force the hash join plan on the partitioned tables via:

set enable_nestloop to off;

This results in a merge join plan which needs to do a giant sort, again 
killed after five minutes.

 Aggregate  (cost=58285765.20..58285765.21 rows=1 width=0)
   ->  Merge Join  (cost=4077389.31..50541735.48 rows=3097611886 width=0)
         Merge Cond: ((i.vin)::text = (l.vin)::text)
         ->  Sort  (cost=4286.45..4370.39 rows=33577 width=21)
               Sort Key: i.vin
               ->  Append  (cost=0.00..1072.77 rows=33577 width=21)
                     ->  Seq Scan on impounds i  (cost=0.00..11.40 rows=140 width=21)
                     ->  [Seq Scans on other partitions]
         ->  Materialize  (cost=4073102.86..4303737.81 rows=18450796 width=21)
               ->  Sort  (cost=4073102.86..4119229.85 rows=18450796 width=21)
                     Sort Key: l.vin
                     ->  Append  (cost=0.00..332797.96 rows=18450796 width=21)
                           ->  Seq Scan on liens l  (cost=0.00..14.00 rows=400 width=21)
                           ->  [Seq Scans on other partitions]

Disabling mergejoin pushes it back to a nestloop join.  Why can't it hash 
join these two together?

Kris Jurka

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance