Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 4/13/11 10:35 AM, "Tom Lane" <tgl@xxxxxxxxxxxxx> wrote:

>Scott Carey <scott@xxxxxxxxxxxxxxxxx> writes:
>>> On Oct 27, 2010, at 12:56 PM, Tom Lane wrote:
>>> Because a poorly distributed inner relation leads to long hash chains.
>>> In the very worst case, all the keys are on the same hash chain and it
>>> degenerates to a nested-loop join.
>
>> A pathological skew case (all relations with the same key), should be
>> _cheaper_ to probe.
>
>I think you're missing the point, which is that all the hash work is
>just pure overhead in such a case (and it is most definitely not
>zero-cost overhead).  You might as well just do a nestloop join.
>Hashing is only beneficial to the extent that it allows a smaller subset
>of the inner relation to be compared to each outer-relation tuple.
>So I think biasing against skew-distributed inner relations is entirely
>appropriate.

No it is not pure overhead, and nested loops is far slower.  The only way
it is the same is if there is only _one_ hash bucket!  And that would be a
bug...
In the pathological skew case:

Example:  1,000,000 outer relations.  10,000 inner relations, all with one
key.

Nested loops join:
10 billion compares.

Hashjoin with small inner relation hashed with poor hash data structure:
1. 10,000 hash functions to build the hash (10,000 'puts').
2. 1,000,000 hash functions to probe (1,000,000 'gets').
3. Only keys that fall in the same bucket trigger a compare.  Assume 100
hash buckets (any less is a bug, IMO) and skew such that the bucket is 10x
more likely than average to be hit. 100,000 hit the bucket.  Those that
match are just like nested loops -- this results in 1 billion compares.
All other probes hit an empty bucket and terminate without a compare.
Total: 1.01 million hash functions and bucket seeks, 0.01 of which are
hash 'puts', + 1 billion compares


Hashjoin with 'one entry per key; entry value is list of matching
relations' data structure:
1. 10,000 hash functions to build the hash (10,000 'puts').
2. 1,000,000 hash functions to probe (1,000,000 'gets').
3. Only keys that fall in the same bucket trigger a compare.  Assume 100
hash buckets and enough skew so that the bucket is 10x as likely to be
hit. 100,000 hit bucket.  Those that match only do a compare against one
key -- this results in 100,000 compares.
Total: 1.01 million hash functions and bucket seeks, 0.01 of which are
slightly more expensive hash 'puts', + 0.1 million compares



10 billion compares is much more expensive than either hash scenario.  If
a hash function is 5x the cost of a compare, and a hash 'put' 2x a 'get'
then the costs are about:

10 billion,
1.006 billion,
~6 million

The cost of the actual join output is significant (pairing relations and
creating new relations for output) but close to constant in all three.



In both the 'hash the big one' and 'hash the small one' case you have to
calculate the hash and seek into the hash table the same number of times.
10,000 hash calculations and 'puts' + 1,000,000 hash calculations and
'gets', versus 1,000,000 hash 'puts' and 10,000 'gets'.
But in one case that table is smaller and more cache efficient, making
those gets and puts cheaper.

Which is inner versus outer changes the number of buckets, which can alter
the number of expected compares, but that can be controlled for benefit --
the ratio of keys to buckets can be controlled.  If you choose the smaller
relation, you can afford to overcompensate with more buckets, resulting in
more probes on empty buckets and thus fewer compares.

Additionally, a hash structure that only has one entry per key can greatly
reduce the number of compares and make hashjoin immune to skew from the
cost perspective.  It also makes it so that choosing the smaller relation
over the big one to hash is always better provided the number of buckets
is chosen well.


>
>            regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux