Hi all, I seem to be in a bit of a jam with the query planner choosing a significantly less optimal route for a query. First, a common check list I've seen around when researching these types of issues on list: 1. pgsql version is PostgreSQL 9.4.7 running on Amazon RDS 2. Vacuum analyze was run 3. default_statistics_target was increased to 10,000 for the whole test database and re-analyzed Code and info referenced is in a github gist [1]. The files are: [Q-Report.sql] The actual SQL query I'm trying to optimize, runs a report on a large set of data [QP-Report-Normal.txt] The query plan of Q-Report.sql without any modification of the query plan [QP-Report-DisableHashJoinltxt] The query plan of Q-Report.sql after `set enable_hashjoin = false` [Q-CC-Underestimated.sql] An isolated query that shows 1 of potentially 2 significant row count underestimations [QP-CC-Understimated.txt] The query plan of Q-CC-Underestimated.sql Essentially, my report query has a hash join that is taking about 70% of the query time (45s), whereas a merge join seems to take less than a second. It looks like there are probably 2 significant row underestimations which may be contributing to this (first of which is the Q-CC query), but I haven't been able to validate that this is specifically the reason the query planner picks a hash join. My usage on this schema are mostly queries that get very small slices of data, however this particular query is a background workload that generates denormalized reports. That said, any kind of global configuration being added that would jeopardize the performance of the small-slice queries is less desirable. My question is, what would be the right way to go about diagnosing and optimizing this kind of issue? I'm trying to think of ways to potentially alter the structure of the query to reduce impact of the planning estimates, but I'm not coming up with any good ideas there. As mentioned, stats target was increased to 10,000 for testing and while it changed the row estimates, it was only by about 12 rows -- did I maybe do something wrong there? With this particular kind of query, I feel that I'd be willing to trade out for a vast "overestimation" scenario for the analyzer, but I can't find anything that allows me to tweak this for a particular query. I've come across pg_hint_plan but doesn't seem like it would support my version of PSQL and I'd prefer to be able to stay on RDS if possible. I came across a pending patch for multivariate statistics, but that seems a little bit far out at this point. Any advice for workarounds or solutions would be greatly appreciated! Thanks in advance, Branden [1] https://gist.github.com/mrvisser/a22dddbdaa64ae1aa3dbc9637617b70d -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general