Hi Tom,
Thank you for your suggestions. I tried increasing from_collapse_limit and join_collapse_limit to 16 in a specific session and that significantly improved my query performance (it takes < 2s now). Now, my instinct is to increase this globally but I'm sure there are some drawbacks to this so I will need to read more about it.
Your point about parallelism is interesting, I hadn't considered that.
Even after working with Postgres for years, there really is a lot to learn about query optimization that is new for me. I'd never heard of these parameters before your email since almost every performance issue I've had thus far was resolved by creating an index or smarter query re-writing.
I'm reading the documentation regarding these specific parameters, but it's written as a reference page as opposed to an explanation into query planning and optimization. I wonder if there is a class or book these details better.
Anyway, thank you so much for pointing me in the right direction.
Best,
Arjun
On Thu, Sep 27, 2018 at 1:21 PM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Arjun Ranade <ranade@xxxxxxxxxxxxxxxxx> writes:
> I have a strange performance situation that I cannot resolve with my usual
> process.
> I have a SELECT statement that completes in about 12 seconds for the full
> result (~1100 rows).
> If I create an empty table first, and then INSERT with the SELECT query, it
> takes 6.5 minutes.
> When I look at the EXPLAIN ANALYZE output, it seems that it's using a
> drastically different query plan for the INSERT+SELECT than SELECT by
> itself.
The reason for the plan shape difference is probably that the bare SELECT
is allowed to use parallelism while INSERT/SELECT isn't. I'm not sure
to what extent we could relax that without creating semantic gotchas.
However, your real problem with either query is that the planner's
rowcount estimates are off by several orders of magnitude. If you could
improve that, you'd likely get better plan choices in both cases.
I also notice that this seems to be a 14-way join, which means you're
probably getting an artificially poor plan as a result of
from_collapse_limit and/or join_collapse_limit constraining the planner's
search space. Maybe raising those limits would help, although I'm not
sure how much it'd help if the rowcount estimates aren't improved.
Since you haven't told us much of anything about the actual query or the
data, it's hard to offer concrete advice beyond that.
regards, tom lane