Re: HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

Tom Lane <tgl@xxxxxxxxxxxxx> · Tue, 26 Oct 2010 23:48:04 -0400

Robert Haas <robertmhaas@xxxxxxxxx> writes:
> I'm also a bit suspicious of the fact that the hash condition has a
> cast to text on both sides, which implies, to me anyway, that the
> underlying data types are not text.  That might mean that the query
> planner doesn't have very good statistics, which might mean that the
> join selectivity estimates are wackadoo, which can apparently cause
> this problem:

Um ... you're guilty of the same thing as the OP, ie not showing how
you got this example.  But I'm guessing that it was something like

create table little as select * from generate_series(1,10) a;
create table big as select * from generate_series(1,100000) a;
... wait for auto-analyze of big ...
explain select * from little, big where little.a = big.a;

Here, big is large enough to prod autovacuum into analyzing it,
whereas little isn't.  So when the planner runs, it sees

(1) big is known to have 100000 rows, and big.a is known unique;
(2) little is estimated to have many fewer rows, but nothing is
    known about the distribution of little.a.

In this situation, it's going to prefer to hash big, because hash join
behaves pretty nicely when the inner rel is uniformly distributed and
the outer not, but not nicely at all when it's the other way round.
It'll change its mind as soon as you analyze little, but it doesn't
like taking a chance on an unknown distribution.  See cost_hashjoin
and particularly estimate_hash_bucketsize.

I'm not convinced this explains Scott's results though --- the numbers
he's showing don't seem to add up even if you assume a pretty bad
distribution for the smaller rel.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance