Re: Too few rows expected by Planner on partitioned tables

Julian Wolf <julian.wolf@xxxxxxxxxxx> · Wed, 26 Aug 2020 06:54:39 +0000

Hi Justin,

thank you very much for your help and sorry for the late answer. 

After testing around with your suggestions, it actually was the daterange type which caused all the problems. Messing around with the statistics value improved performance drastically but did not solve the problem. We decided to replace the daterange type with
 a BIGINT and calculate the "id" of the daterange by just using the BIGINT (2x 4 bytes) representation of the daterange. Thus, it can be transformed in both directions immutably.

CREATE OR REPLACE FUNCTION to_daterange_id(daterange DATERANGE)
    RETURNS BIGINT
    IMMUTABLE
    LANGUAGE plpgsql
AS
$$
BEGIN
    return (extract(EPOCH FROM lower(daterange))::BIGINT << 32) |
           extract(EPOCH FROM upper(daterange))::BIGINT;
end;

--------------------------------------------------------------------------------------------------------------
CREATE OR REPLACE FUNCTION to_daterange(daterange_id BIGINT)
    RETURNS DATERANGE
    IMMUTABLE
    LANGUAGE plpgsql
AS
$$
BEGIN
    RETURN daterange(to_timestamp(daterange_id >> 32)::DATE, to_timestamp(daterange_id & x'FFFFFFFF'::BIGINT)::DATE);
END;
$$;

So there is no daterange object messing up the primary key index. Your other suggestions sadly didn't work, as the daterange was the partition key of the table too, this field was inevitably the first criterion in all queries and thus overruled every other
 index. 

With that said and done, it would be nice, if daterange objects could be used in unique indexes too. They are a great way to identify data which represents a week, month, etc. worth of data (similar to a two-column-date representation).

Thank you very much again for your time and help

Julian

Julian P. Wolf | Invenium Data Insights GmbH
julian.wolf@xxxxxxxxxxx | +43 664 88 199 013
Herrengasse 28 
| 8010 Graz | www.invenium.io

From: Justin Pryzby <pryzby@xxxxxxxxxxxxx>

Sent: Wednesday, July 22, 2020 4:40 PM

To: Julian Wolf <julian.wolf@xxxxxxxxxxx>

Cc: pgsql-performance Postgres Mailing List <pgsql-performance@xxxxxxxxxxxxxxxxxxxx>

Subject: Re: Too few rows expected by Planner on partitioned tables

On Tue, Jul 21, 2020 at 01:09:22PM +0000, Julian Wolf wrote:

> Hello,

> 

> A description of what you are trying to achieve and what results you expect:

> Our database is growing on a daily basis by about 2.5million rows per table (2 at the moment). Because of that, we decided to partition the data, especially, as we are pre-aggregating the data for weeks, months, quarters and years. Every aggregation is stored
 in a separate partition:

> 

...

> Our problem is, that the planner always predicts one row to be returned, although only a part of the primary key is queried. This problem exceeds feasibility of performance rapidly - a query only involving a few days already takes dozens of seconds. All tables
 are analyzed and pg_stats looks reasonable IMHO.

...

>     PRIMARY KEY ( daterange, spatial_feature_id, visitor_profile_id, activity_type_combination_id,

>                  activity_chain_id),

...

> ) PARTITION BY LIST (daterange);

> schemaname relname n_live_tup

> mobility_insights location_statistics_y2019m03d 23569853

> mobility_insights location_statistics_y2019m03w 19264373

> mobility_insights location_statistics_y2019m03 18105295

> Aggregate  (cost=2.79..2.80 rows=1 width=8) (actual time=143.073..143.073 rows=1 loops=1)

>   Buffers: shared hit=67334

>   ->  Index Scan using location_statistics_y2019m03w_pkey on location_statistics_y2019m03w st  (cost=0.56..2.78 rows=1 width=8) (actual time=0.026..117.284 rows=516277 loops=1)

>         Index Cond: ((daterange = '[2019-03-04,2019-03-11)'::daterange) AND (spatial_feature_id = 12675))

>         Buffers: shared hit=67334

I guess this isn't actually the problem query, since it takes 143ms and not

dozens of seconds.  I don't know what is the problem query, but maybe it might

help to create an new index on spatial_feature_id, which could be scanned

rather than scanning the unique index.

Also, if daterange *and* spatial_feature_id are always *both* included, then

this might work:

postgres=# CREATE STATISTICS t_stats (mcv) ON daterange,spatial_feature_id FROM t ; ANALYZE t;

-- 

Justin