Re: Looks like merge join planning time is too big, 55 seconds

Tom Lane <tgl@xxxxxxxxxxxxx> · Fri, 02 Aug 2013 15:43:25 -0400

Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> writes:
> Tom Lane escribió:
>> It calls get_variable_range, which only looks at the pg_statistic entries.

> Uh?  It's right there in line 2976 in HEAD.

Meh.  You're right, I was thinking of this bit in get_variable_range()

    /*
     * XXX It's very tempting to try to use the actual column min and max, if
     * we can get them relatively-cheaply with an index probe.  However, since
     * this function is called many times during join planning, that could
     * have unpleasant effects on planning speed.  Need more investigation
     * before enabling this.
     */
#ifdef NOT_USED
    if (get_actual_variable_range(root, vardata, sortop, min, max))
        return true;
#endif

I think when that was written, we didn't have the code in scalarineqsel
that tries to go out and get the actual endpoints from an index.  Now
that we do, the planning cost impact that I was afraid of here can
actually bite us, and it seems that at least for Sergey's case it's pretty
bad.  Another problem is that we'll end up comparing endpoints gotten from
pg_statistic to endpoints gotten from the index, making the resulting
numbers at least self-inconsistent and very possibly meaningless.

The planner already caches the results of mergejoinscansel in hopes of
alleviating its cost, but I wonder if we need another lower-level cache
for the min/max values of each variable that participates in a
mergejoinable clause.

Having said that, it's still not clear why these probes are so expensive
in Sergey's case.  I favor your idea about lots of dead rows, but we don't
have actual proof of it.  Maybe pgstattuple could help here?

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance