Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

Gunther <raj@xxxxxxxx> · Fri, 3 Nov 2017 10:36:57 -0400

On Fri, Nov 3, 2017 at 5:28 AM, Thomas Kellerer <spam_eater@xxxxxxx> wrote:
I do like Oracle's approach with SQL profiles, where you can force the
optimizer to try harder to find a good execution plan. I _think_ it even
runs the statement with multiple plans and compares the expected outcome
with the actual values. Once a better plan is found that plan can be
attached to that query and the planner will use that plan with subsequent
executions.
I have used that approach with Oracle. I didn't like it. It is too 
difficult, too complicated. Requires all sorts of DBA privileges. 
Nothing that would help a lowly user trying his ad-hoc queries.

I think a "dynamic feedback plan optimization" would be more innovative 
and ultimately deliver better on the original RDBMS vision. The RDBMS 
should exert all intelligence that it can to optimize the query 
execution. (I know that means: no reliance on hints.)

There is so much more that could be done, such as materialized and 
potentially indexed partial results. (I know Oracle as materialized 
partial results).

But the dynamic feedback plan would be even cooler.  So that means the 
outer relation should be built or sampled to estimate the selectivity, 
the inner relation should be built completely, and if it is too large, 
it should be thrown back to the optimizer to change the plan.

Or may be the planner needs some second look pattern matching 
criticizer: Any pattern of Nested Loop I would re-check and possibly 
sample a few rows. And Nested Loop with costly inner loop should almost 
always be avoided. Nested Loop of Seq Scan is a no-no unless it can be 
proven that the cardinality of the inner relation to scan is less than 100.

But even more, once you have the inner and outer table of a Nested Loop 
built or sampled, there should be no reason not to run the Hash Join. I 
guess I still don't get why the optimizer even today would EVER consider 
a Nested Loop over a Hash Join, unless there is some clear indication 
that the query will be used to just get the FIRST ROWS (Oracle hint) and 
that those first rows will actually exist (user waits 30 minutes at 100% 
CPU only to be informed that the query has no results!), and that the 
results are likely to come out early in the Nested Loop! So many 
constraints to make that Nested Loop plan a successful strategy. Why 
ever choose it???

I guess, hints or no hints, I think Nested Loops should not be used by 
the optimizer unless it has positive indication  that it meets all the 
criteria for being a good strategy, i.e., that there is a continuous 
path of indexed columns starting with constant query parameters. This is 
the usual OLTP query. And that is what Nested Loops are for. But in all 
other cases, and if space allows at all, always use Hash Joins. It is 
even cheaper to do a trial and error! Assume that space will allow, and 
quit if it doesn't, rather than being sheepish and going to a 1 hour CPU 
bound operation. Because if space does not allow, the chance for Nested 
Loop being a good idea is also close to nil! So if space doesn't allow, 
it would be Sort-Merge on Disk. Especially if the query has a DISTINCT 
or ORDER BY clause anyway! Why is that not always a better strategy?

And yes, until all this is figured out: by all means include the 
pg_hint_plan.c -- pretty please!

regards,
-Gunther

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance