> -----Original Message----- > From: pgsql-performance-owner@xxxxxxxxxxxxxx [mailto:pgsql-performance- > owner@xxxxxxxxxxxxxx] On Behalf Of Gunther > Sent: Wednesday, November 01, 2017 20:29 > To: pgsql-performance@xxxxxxxxxxxxxx > Subject: OLAP/reporting queries fall into nested loops over seq > scans or other horrible planner choices > > Hi, this is Gunther, have been with PgSQL for decades, on an off this list. > Haven't been on for a long time making my way just fine. But there is one thing > that keeps bothering me both with Oracle and PgSQL. And that is the > preference for Nested Loops. > > Over the years the archives have questions about Nested Loops being chosen > over Hash Joins. But the responses seem too specific to the people's queries, > ask many questions, make them post the query plans, and often end up > frustrating with suggestions to change the data model or to add an index and > stuff like that. > > One should not have to go into that personal detail. > > There are some clear boundaries that a smart database should just never cross. > > Especially with OLAP queries. Think a database that is fine for OLTP, has > indexes and the index based accesses for a few records joined with a dozen > other tables all with indexes is no problem. If you fall into a Seq Scan scenario > or unwanted Hash Join, you usually forgot to add an index or forgot to put index > columns into your join or other constraints. Such are novice questions and we > should be beyond that. > > But the issue is bulk searches, reports, and any analytic queries scenarios. In > those queries Nested Loops are almost always a bad choice, even if there is an > index. In over 20 years of working with RDBMs this has been my unfailing > heuristics. A report runs slow? Look at plan, is there a Nested Loop? Yes? > Squash it! And the report runs 10x faster instantaneously. > > So, all the more troublesome is if any database system (here PgSQL) would > ever fall into a Nested Loop trap with CPU spinning at 100% for several > minutes, with a Nested Loop body of anything from a Seq Scan or worse with a > cardinality of anything over 10 or 100. Nested Loops of Nested Loops or Nested > Loops of other complex query plan fragments should be a no-no and chosen > only as an absolute last resort when the system cannot find enough memory, > even then disk based merge sort should be better, i.e., Nested Loops should > never be chosen. Period. > > If you can set enable_nestloop off and the Hash Join is chosen and the > performance goes from 1 hour of 100% CPU to 10 seconds completion time, > then something is deadly wrong. And it doesn't matter to me if I should have > re-written my query in some funny ways or tweaked my data model, these are > all unacceptable options when you have a complex system with hybrid > OLTP/OLAP uses. Don't tell me to de-normalize. I know I can materialize joins > in tables which I can then use again in joins to save time. But that is not the > point here. > > And I don't think tweaking optimizer statistics is the solution either. > Because optimizer statistics quickly become worthless when your criteria get > more complex. > > The point is that Nested Loops should never be chosen except in index lookup > situations or may be memory constraints. > > How can I prevent it on a query by query scope? I cannot set enable_nestloop = > off because one query will be for a full report, wile another one might have > indexed constraints running in the same session, and I don't want to manage > side effects and remember to set enable_nestloop parameter on and off. > > There must be a way to tell the optimizer to penalize nested loops to make > them the last resort. In Oracle there are those infamous hints, but they don't > always work either (or it is easy to make mistakes that you get no feedback > about). > > Is there any chance PgSQL can get something like a hint feature? Or is there a > way to use postgresql.conf to penalize nested loops so that they would only ever > be chosen in the most straight-forward situations as with query parameters > that are indexed? I know I need to have sufficient work_mem, but if you can set > enable_nestloop = off and you get the desired Hash Join, there is obviously > sufficient work_mem, so that isn't the answer either. > > Thanks for listening to my rant. > > regards, > -Gunther > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance [Laurent Hasson] Hello Gunther, Just adding to your voice. I recently experienced the same issue with a complex multi-table view, including pivots, and was surprised to see all the nested loops everywhere in spite of indices being available. I spent a lot of time optimizing the query and went from about 1h to about 3mn, but penalizing nested loops in favor of other "joining" techniques seem to make sense as a strategy. Either that, or there is something I really don't understand here either and would love to be educated :) Laurent. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance