Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 3, 2017 at 5:28 AM, Thomas Kellerer <spam_eater@xxxxxxx> wrote:
I do like Oracle's approach with SQL profiles, where you can force the
optimizer to try harder to find a good execution plan. I _think_ it even
runs the statement with multiple plans and compares the expected outcome
with the actual values. Once a better plan is found that plan can be
attached to that query and the planner will use that plan with subsequent
executions.
I have used that approach with Oracle. I didn't like it. It is too difficult, too complicated. Requires all sorts of DBA privileges. Nothing that would help a lowly user trying his ad-hoc queries.

I think a "dynamic feedback plan optimization" would be more innovative and ultimately deliver better on the original RDBMS vision. The RDBMS should exert all intelligence that it can to optimize the query execution. (I know that means: no reliance on hints.)

There is so much more that could be done, such as materialized and potentially indexed partial results. (I know Oracle as materialized partial results).

But the dynamic feedback plan would be even cooler.  So that means the outer relation should be built or sampled to estimate the selectivity, the inner relation should be built completely, and if it is too large, it should be thrown back to the optimizer to change the plan.

Or may be the planner needs some second look pattern matching criticizer: Any pattern of Nested Loop I would re-check and possibly sample a few rows. And Nested Loop with costly inner loop should almost always be avoided. Nested Loop of Seq Scan is a no-no unless it can be proven that the cardinality of the inner relation to scan is less than 100.

But even more, once you have the inner and outer table of a Nested Loop built or sampled, there should be no reason not to run the Hash Join. I guess I still don't get why the optimizer even today would EVER consider a Nested Loop over a Hash Join, unless there is some clear indication that the query will be used to just get the FIRST ROWS (Oracle hint) and that those first rows will actually exist (user waits 30 minutes at 100% CPU only to be informed that the query has no results!), and that the results are likely to come out early in the Nested Loop! So many constraints to make that Nested Loop plan a successful strategy. Why ever choose it???

I guess, hints or no hints, I think Nested Loops should not be used by the optimizer unless it has positive indication  that it meets all the criteria for being a good strategy, i.e., that there is a continuous path of indexed columns starting with constant query parameters. This is the usual OLTP query. And that is what Nested Loops are for. But in all other cases, and if space allows at all, always use Hash Joins. It is even cheaper to do a trial and error! Assume that space will allow, and quit if it doesn't, rather than being sheepish and going to a 1 hour CPU bound operation. Because if space does not allow, the chance for Nested Loop being a good idea is also close to nil! So if space doesn't allow, it would be Sort-Merge on Disk. Especially if the query has a DISTINCT or ORDER BY clause anyway! Why is that not always a better strategy?

And yes, until all this is figured out: by all means include the pg_hint_plan.c -- pretty please!

regards,
-Gunther



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux