Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance-
> owner@xxxxxxxxxxxxxx] On Behalf Of Gunther
> Sent: Wednesday, November 01, 2017 20:29
> To: pgsql-performance@xxxxxxxxxxxxxx
> Subject:  OLAP/reporting queries fall into nested loops over seq
> scans or other horrible planner choices
> 
> Hi, this is Gunther, have been with PgSQL for decades, on an off this list.
> Haven't been on for a long time making my way just fine. But there is one thing
> that keeps bothering me both with Oracle and PgSQL. And that is the
> preference for Nested Loops.
> 
> Over the years the archives have questions about Nested Loops being chosen
> over Hash Joins. But the responses seem too specific to the people's queries,
> ask many questions, make them post the query plans, and often end up
> frustrating with suggestions to change the data model or to add an index and
> stuff like that.
> 
> One should not have to go into that personal detail.
> 
> There are some clear boundaries that a smart database should just never cross.
> 
> Especially with OLAP queries. Think a database that is fine for OLTP, has
> indexes and the index based accesses for a few records joined with a dozen
> other tables all with indexes is no problem. If you fall into a Seq Scan scenario
> or unwanted Hash Join, you usually forgot to add an index or forgot to put index
> columns into your join or other constraints. Such are novice questions and we
> should be beyond that.
> 
> But the issue is bulk searches, reports, and any analytic queries scenarios. In
> those queries Nested Loops are almost always a bad choice, even if there is an
> index. In over 20 years of working with RDBMs this has been my unfailing
> heuristics. A report runs slow? Look at plan, is there a Nested Loop? Yes?
> Squash it! And the report runs 10x faster instantaneously.
> 
> So, all the more troublesome is if any database system (here PgSQL) would
> ever fall into a Nested Loop trap with CPU spinning at 100% for several
> minutes, with a Nested Loop body of anything from a Seq Scan or worse with a
> cardinality of anything over 10 or 100. Nested Loops of Nested Loops or Nested
> Loops of other complex query plan fragments should be a no-no and chosen
> only as an absolute last resort when the system cannot find enough memory,
> even then disk based merge sort should be better, i.e., Nested Loops should
> never be chosen. Period.
> 
> If you can set enable_nestloop off and the Hash Join is chosen and the
> performance goes from 1 hour of 100% CPU to 10 seconds completion time,
> then something is deadly wrong. And it doesn't matter to me if I should have
> re-written my query in some funny ways or tweaked my data model, these are
> all unacceptable options when you have a complex system with hybrid
> OLTP/OLAP uses. Don't tell me to de-normalize. I know I can materialize joins
> in tables which I can then use again in joins to save time. But that is not the
> point here.
> 
> And I don't think tweaking optimizer statistics is the solution either.
> Because optimizer statistics quickly become worthless when your criteria get
> more complex.
> 
> The point is that Nested Loops should never be chosen except in index lookup
> situations or may be memory constraints.
> 
> How can I prevent it on a query by query scope? I cannot set enable_nestloop =
> off because one query will be for a full report, wile another one might have
> indexed constraints running in the same session, and I don't want to manage
> side effects and remember to set enable_nestloop parameter on and off.
> 
> There must be a way to tell the optimizer to penalize nested loops to make
> them the last resort. In Oracle there are those infamous hints, but they don't
> always work either (or it is easy to make mistakes that you get no feedback
> about).
> 
> Is there any chance PgSQL can get something like a hint feature? Or is there a
> way to use postgresql.conf to penalize nested loops so that they would only ever
> be chosen in the most straight-forward situations as with query parameters
> that are indexed? I know I need to have sufficient work_mem, but if you can set
> enable_nestloop = off and you get the desired Hash Join, there is obviously
> sufficient work_mem, so that isn't the answer either.
> 
> Thanks for listening to my rant.
> 
> regards,
> -Gunther
> 
> 
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

 [Laurent Hasson] 
Hello Gunther,

Just adding to your voice. I recently experienced the same issue with a complex multi-table view, including pivots, and was surprised to see all the nested loops everywhere in spite of indices being available. I spent a lot of time optimizing the query and went from about 1h to about 3mn, but penalizing nested loops in favor of other "joining" techniques seem to make sense as a strategy. Either that, or there is something I really don't understand here either and would love to be educated :)

Laurent.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux