Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

Robert Haas <robertmhaas@xxxxxxxxx> · Wed, 11 May 2011 23:21:15 -0400

On Wed, Apr 13, 2011 at 1:22 PM, Scott Carey <scott@xxxxxxxxxxxxxxxxx> wrote:
> A pathological skew case (all relations with the same key), should be
> _cheaper_ to probe.   There should be only _one_ entry in the hash (for
> the one key), and that entry will be a list of all relations matching the
> key.  Therefore, hash probes will either instantly fail to match on an
> empty bucket, fail to match the one key with one compare, or match the one
> key and join on the matching list.
>
> In particular for anti-join, high skew should be the best case scenario.

I think this argument may hold some water for an anti-join, and maybe
for a semi-join, but it sure doesn't seem right for any kind of join
that has to iterate over all matches (rather than just the first one);
that is, inner, left, right, or full.

> A hash structure that allows multiple entries per key is inappropriate for
> skewed data, because it is not O(n).  One that has one entry per key
> remains O(n) for all skew.  Furthermore, the hash buckets and # of entries
> is proportional to n_distinct in this case, and smaller and more cache and
> memory friendly to probe.

I don't think this argument is right.  The hash table is sized for a
load factor significantly less than one, so if there are multiple
entries in a bucket, it is fairly likely that they are all for the
same key.  Granted, we have to double-check the keys to figure that
out; but I believe that the data structure you are proposing would
require similar comparisons.  The only difference is that they'd be
required when building the hash table, rather than when probing it.

> You can put either relation on the outside with an anti-join, but would
> need a different algorithm and cost estimator if done the other way
> around.  Construct a hash on the join key, that keeps a list of relations
> per key, iterate over the other relation, and remove the key and
> corresponding list from the hash when there is a match, when complete the
> remaining items in the hash are the result of the join (also already
> grouped by the key).  It could be terminated early if all entries are
> removed.
> This would be useful if the hash was small, the other side of the hash too
> large to fit in memory, and alternative was a massive sort on the other
> relation.

This would be a nice extension of commit
f4e4b3274317d9ce30de7e7e5b04dece7c4e1791.

> Does the hash cost estimator bias towards smaller hashes due to hash probe
> cost increasing with hash size due to processor caching effects?  Its not
> quite O(n) due to caching effects.

I don't think we account for that (and I'm not convinced we need to).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance