On 14 January 2016 at 03:48, David Grelaud <dgrelaud@xxxxxxxxxxx> wrote:
-- 3) Always avoid nested-loop join when no indexes are available?Tom Lane said "There might be some cases where this would help, but there would be many more where it would be useless or counterproductive."Who is right between Tom Lane and the Leis Viktor's paper above?We tried to disable nested_loop all the time in a production environment and we observed an overall improvement in all queries where Indexes are not useful or not available (CTEs), which confirms the paper.In fact, one of our production environment is still running with "nested_loop off" because benefits are a lot greater than drawbacks as long as some tables are relatively small (Indexes not used).
I don't really think any of them are wrong. Simply Tom is talking in general terms for no specific workload, and the paper is dealing with one specific workload. Of course there are cases when a non-parameterised nested loop are the fastest way, I mean what could possibility be faster if there's only 1 row to be joined, for example. It's just that it's not that much faster since such a join is likely to perform very quickly no matter which join algorithm is used.
On the other hand, if your tables are not tiny, or you're never just joining to just a few rows, and you are suffering from stats underestimations, then it's quite probable that you'll improve your workload overall by doing enable_nestloop = off. But you have to remember that if you do this, then you miss out on parameterised inner scans on nested loops. Quite often these are the fastest option, even when the number of rows is fairly large, as it might save building a hash table on a very large relation, or having to sort that relation for a merge join.
Perhaps separating out enable_nestloop so that it only disables non-parameterised nested loops, and add another GUC for parameterised nested loops would be a good thing to do. Likely setting enable_nestloop to off in production would be a slightly easier decision to make, if that was the case.
It looks pretty simple to do this, so I hacked it up, and attached it here. There's no doc changes and I'm not that interested in fighting for this change, it's more just an idea for consideration.
Attachment:
enable_paramnestloop.patch
Description: Binary data
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general