Re: Query performance issue

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Thu, 21 Jan 2021 20:35:14 -0600

On Fri, Jan 22, 2021 at 01:53:26AM +0000, Nagaraj Raj wrote:
> Tables ddl are attached in dbfiddle -- Postgres 11 | db<>fiddle
> Postgres 11 | db<>fiddle
> Server configuration is: Version: 10.11RAM - 320GBvCPU - 32 "maintenance_work_mem" 256MB"work_mem"             1GB"shared_buffers" 64GB

> Aggregate  (cost=31.54..31.55 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..31.54 rows=1 width=8) (actual time=0.007..0.008 rows=0 loops=1)
>         Join Filter: (a.household_entity_proxy_id = c.household_entity_proxy_id)
>         ->  Nested Loop  (cost=0.00..21.36 rows=1 width=16) (actual time=0.006..0.007 rows=0 loops=1)
>               Join Filter: (a.individual_entity_proxy_id = b.individual_entity_proxy_id)
>               ->  Seq Scan on prospect a  (cost=0.00..10.82 rows=1 width=16) (actual time=0.006..0.006 rows=0 loops=1)
>                     Filter: (((last_contacted_anychannel_dttm IS NULL) OR (last_contacted_anychannel_dttm < '2020-11-23 00:00:00'::timestamp without time zone)) AND (shared_paddr_with_customer_ind = 'N'::bpchar) AND (profane_wrd_ind = 'N'::bpchar) AND (tmo_ofnsv_name_ind = 'N'::bpchar) AND (has_individual_address = 'Y'::bpchar) AND (has_last_name = 'Y'::bpchar) AND (has_first_name = 'Y'::bpchar))
>               ->  Seq Scan on individual_demographic b  (cost=0.00..10.53 rows=1 width=8) (never executed)
>                     Filter: ((tax_bnkrpt_dcsd_ind = 'N'::bpchar) AND (govt_prison_ind = 'N'::bpchar) AND ((cstmr_prspct_ind)::text = 'Prospect'::text))
>         ->  Seq Scan on household_demographic c  (cost=0.00..10.14 rows=3 width=8) (never executed)
>               Filter: (((hspnc_lang_prfrnc_cval)::text = ANY ('{B,E,X}'::text[])) OR (hspnc_lang_prfrnc_cval IS NULL))
> Planning Time: 1.384 ms
> Execution Time: 0.206 ms
> 13 rows

It's doing nested loops with estimated rowcount=1, which indicates a bad
underestimate, and suggests that the conditions are redundant or correlated.

Maybe you can handle this with MV stats on the correlated columns:

CREATE STATISTICS prospect_stats (dependencies) ON
    shared_paddr_with_customer_ind, profane_wrd_ind, tmo_ofnsv_name_ind, has_individual_address, has_last_name, has_first_name
    FROM prospect;
CREATE STATISTICS individual_demographic_stats (dependencies) ON
    tax_bnkrpt_dcsd_ind, govt_prison_ind, cstmr_prspct_ind
    FROM individual_demographic_stats 
ANALYZE prospect, individual_demographic_stats ;

Since it's expensive to compute stats on large number of columns, I'd then
check *which* are correlated and then only compute MV stats on those.  This
will show col1=>col2: X where X approaches 1, the conditions are highly
correlated:
SELECT * FROM pg_statistic_ext; -- pg_statistic_ext_data since v12

Also, as a diagnostic tool to get "explain analyze" to finish, you can
SET enable_nestloop=off;

-- 
Justin