Re: Too few rows expected by Planner on partitioned tables

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Tue, 21 Jul 2020 12:27:13 -0500

On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:
> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables are analyzed and pg_stats looks reasonable IMHO.

>     daterange                daterange NOT NULL,
>     spatial_feature_id           INTEGER,

> Aggregate  (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)
>   Buffers: shared hit=67334
>   ->  Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st  (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)
>         Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))
>         Buffers: shared hit=67334
> 
> As can be seen, the planner predicts one row to be returned, although it should be around 3% (11% of the entries are of the given ID, which are distributed over 4 weeks = date ranges) of the table. Using the partition table directly, does not change this fact.

Is there a correlation between daterange and spacial_feature_id ?

Are the estimates good if you query on *only* daterange?  spacial_feature_id ?

Maybe what you need is:
https://www.postgresql.org/docs/devel/sql-createstatistics.html
CREATE STATISTICS stats (dependencies) ON daterange, spacial_feature_id FROM location_statistics;
ANALYZE location_statistics;

-- 
Justin