Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Oct 18, 2010, at 8:43 PM, Tom Lane wrote:

> Scott Carey <scott@xxxxxxxxxxxxxxxxx> writes:
>> I consistently see HashJoin plans that hash the large table, and scan
>> the small table.
> 
> Could we see a self-contained test case?  And what cost parameters are
> you using, especially work_mem?

I'll see if I can make a test case.  

Tough to do since I catch these on a production machine when a query is taking a long time, and some of the tables are transient.
work_mem is 800MB, I tried 128K and it didn't matter.  It will switch to merge join at some point, but not a smaller hash, and the merge join is definitely slower.

> 
>> This is especially puzzling in some cases where I have 30M rows in the big table and ~ 100 in the small... shouldn't it hash the small table and scan the big one?
> 
> Well, size of the table isn't the only factor; in particular, a highly
> nonuniform distribution of the key value will inflate the cost estimate
> for using a table on the inner size of the hash.  But the example you
> show here seems a bit extreme.
> 

In the case today, the index scan returns unique values, and the large table has only a little skew on the join key.

Another case I ran into a few weeks ago is odd. 
(8.4.3 this time)

rr=> explain INSERT INTO pav2 (p_id, a_id, values, last_updated, s_id) SELECT av.p_id, av.a_id, av.values, av.last_updated, a.s_id 
FROM pav av, attr a where av.a_id = a.id;
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Hash Join  (cost=2946093.92..631471410.73 rows=1342587125 width=69)
   Hash Cond: (a.id = av.a_id)
   ->  Seq Scan on attr a  (cost=0.00..275.21 rows=20241 width=8)
   ->  Hash  (cost=1200493.44..1200493.44 rows=70707864 width=65)
         ->  Seq Scan on pav av  (cost=0.00..1200493.44 rows=70707864 width=65)

If the cost to hash is 1200493, and it needs to probe the hash 20241 times, why would the total cost be 631471410?  The cost to probe can't be that big! A cost of 500 to probe and join?  
Why favor hashing the large table and probing with the small values rather than the other way around?

In this case, I turned enable_mergejoin off in a test because it was deciding to sort the 70M rows instead of hash 20k rows and scan 70M, and then got this 'backwards' hash.  The merge join is much slower, but the cost estimate is much less and no combination of cost parameters will make it switch (both estimates are affected up and down similarly by the cost parameters).

Both tables analyzed, etc.  One of them is a bulk operation staging table with no indexes (the big one), but it is analyzed.  The (av.p_id, av.a_id) pair is unique in it. a.id is unique (primary key). The above thinks it is going to match 20 times on average (but it actually matches only 1 -- PK join).  av.a_id is somewhat skewed , but that is irrelevant if it always matches one.  Even if it did match 20 on average, is it worse to probe a hash table 70M times and retrieve 20 maches each time than probe 20k times and retrive 70000 matches each time?  Its the same number of hash function calls and comparisons, but different memory profile.




> 			regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux