Re: Re: Query planner using hash join when merge join seems orders of magnitude faster

Tom Lane <tgl@xxxxxxxxxxxxx> · Mon, 01 Aug 2016 18:56:49 -0400

Branden Visser <mrvisser@xxxxxxxxx> writes:
> I just wanted to update that I've found evidence that fixing the
> planner row estimation may not actually influence it to use the more
> performant merge join instead of hash join. I have found instances
> where the row estimation is *overestimated* by a magnitude of 4x
> (estimates 2.4m rows) and still chooses hash join over merge join,
> where merge join is much faster (45s v.s. 12s).

I wonder why the merge join is faster exactly.  It doesn't usually have a
huge benefit unless the inputs are presorted already.  The one case I can
think of where it can win quite a lot is if the range of merge keys in one
input is such that we can skip reading most of the other input.  (Extreme
example: one input has keys 1..10, but the other input has keys 1..10000.
We only need to read the first 1% of the second input, assuming there's an
index on its key column so that we don't have to read the whole thing
anyway to sort it.)

The planner is aware of that effect, but I wonder if it's misestimating it
for some reason.  Anyway it would be worth looking closely at your EXPLAIN
ANALYZE results to determine whether early-stop is happening or not.  It'd
manifest as one join input node showing an actual number of rows returned
that's less than you'd expect.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general