Re: Bad selectivity estimate when using a sub query to determine WHERE condition

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Mon, 10 Feb 2020 13:13:04 -0600

On Mon, Feb 10, 2020 at 11:34:01AM +0100, Chris Borckholder wrote:
> I have a large table of immutable events that need to be aggregated
> regularly to derive statistics. To improve the performance, that table is
> rolled up every 15minutes, so that online checks can aggregate rolled up
> data and combine it with latest events created after the last roll up.
> 
> To implement this a query is executed that selects only events after the
> time of the last rollup.
> That time is determined dynamically based on a log table.

Perhaps that could be done as an indexed column in the large table, rather
than querying a 2nd log table.
Possibly with a partial index on that column: WHERE unprocessed='t'.

> When using a sub select or CTE to get the latest roll up time, the query
> planner fails to recognize that a most of the large table would be filtered
> out by the condition and tries a sequential scan instead of an index scan.
> When using the literal value for the WHERE condition, the plan correctly
> uses an index scan, which is much faster.
> 
> I analyzed the involved tables and increased the collected histogram, but
> the query plan did not improve. Is there a way to help the query planner
> recognize this in the dynamic case?

Also, if you used partitioning with pgostgres since v11, then I think most
partitions would be excluded:

https://www.postgresql.org/docs/12/release-12.html
|Allow partition elimination during query execution (David Rowley, Beena Emerson)
|Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=499be013de65242235ebdde06adb08db887f0ea5

https://www.postgresql.org/about/featurematrix/detail/332/

Justin