Re: Major performance degradation with joins in 15.8 or 15.7?

Ed Sabol <edwardjsabol@xxxxxxxxx> · Thu, 7 Nov 2024 21:45:37 -0500

On Nov 7, 2024, at 9:27 PM, Andrei Lepikhov <lepihov@xxxxxxxxx> wrote:
> Postgres didn't want Materialize in this example because of the low estimation on its outer subquery. AFAIC, by increasing the *_page_cost's value, you added extra weight to the inner subquery and shifted the decision to use materialisation.

Interesting, except I decreased the random_page_cost. Just clarifying.

> I see huge underestimation in the simple scan:
> 
> Bitmap Heap Scan on metainfo b_1
>  (cost=23.96..35.77 rows=3 width=38)
>  (actual time=1.225..4.206 rows=1025 loops=1)
> 
> It may be caused by some functional dependency in its filter:
> 
> ((relation = 'located'::text) AND (type = 'document'::text))
> 
> You can create extended statistics on the columns 'relation' and 'type'. These statistics can reduce estimation errors and enable the optimiser to find a better plan without changing the cost balance.

OK, this is the first I'm learning about extended statistics...

I'm looking at https://www.postgresql.org/docs/15/sql-createstatistics.html
and https://www.postgresql.org/docs/15/planner-stats.html#PLANNER-STATS-EXTENDED

What kind of extended statistics do you suggest for this? ndistinct, dependencies, or mcv?

CREATE STATISTICS tablename_stats (<statistics type?>) ON relation, type FROM tablename;
ANALYZE tablename;

Thanks,
Ed