Re: Performance of lateral join

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Tue, 27 Jul 2021 21:27:33 -0500

On Tue, Jul 27, 2021 at 09:08:49AM +0000, Simen Andreas Andreassen Lønsethagen wrote:
> >    Easy first question: is the temp table analyzed before being used in a join ?
> 
> No, I haven't done that. Today, I tried to run 
> 
> 	ANALYZE records_to_filter_on;
> 
> on the same sample data set (3.75 million rows) before the join, and it did not seem to make much of a difference in terms of time (new output of EXPLAIN ANALYZE at https://explain.dalibo.com/plan/YZu - it seems very similar to me). 

If the "shape" of the plan didn't change, then ANALYZE had no effect.

I think you'd see an improvement if both tables were ordered by foedselsnummer.
It might be that that's already somewhat/partially true (?)

I suggest to create an index on the temp table's r.foedselsnummer, CLUSTER on
that index, and then ANALYZE the table.  The index won't be useful for this
query, it's just for clustering (unless you can instead populate the temp table
in order).

Check if there's already high correlation of dpd_bis_foedselsnummer (over 0.9):
| SELECT tablename, attname, inherited, null_frac, n_distinct, correlation FROM pg_stats WHERE attname='dpd_bis_foedselsnummer' AND tablename='...';

If not, consider clustering on the existing "unique_descending" index and then
analyzing that table, too.

This would also affect performance of other queries - hopefully improving
several things at once.

-- 
Justin