Hi, When join two table on multiple columns equaljoin, rows estimation always use selectivity = multiplied by distinct multiple individual columns, possible to use extended n-distinct statistics on multiple columns?
PG v14.8-1, attached please check test case with details. Thanks, James |
testdb=> \d test1 Table "csdigital.test1" Column | Type | Collation | Nullable | Default -----------------+------------------------+-----------+----------+--------- account_key | character varying(36) | | not null | subscription_id | character varying(36) | | not null | site_key | character varying(36) | | not null | site_id | numeric(38,0) | | | site_name | character varying(128) | | | Indexes: "test1_pk" PRIMARY KEY, btree (account_key, subscription_id, site_key) testdb=> \d test2 Table "csdigital.test2" Column | Type | Collation | Nullable | Default -----------------+------------------------+-----------+----------+--------- account_key | character varying(36) | | not null | subscription_id | character varying(36) | | not null | offer_code | character varying(128) | | not null | offer_name | character varying(128) | | | offer_type | character varying(128) | | | Indexes: "test2_pk" PRIMARY KEY, btree (account_key, subscription_id, offer_code) testdb=> explain select count(*) from test1 t1,test2 t2 where t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=629339.31..629339.32 rows=1 width=8) -> Merge Join (cost=1.99..629339.26 rows=22 width=0) Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND ((t1.subscription_id)::text = (t2.subscription_id)::text)) -> Index Only Scan using test1_pk on test1 t1 (cost=0.56..263134.00 rows=4906772 width=61) -> Index Only Scan using test2_pk on test2 t2 (cost=0.56..309937.54 rows=6346934 width=59) (5 rows) Time: 8.018 ms testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND tablename='test1' ORDER BY 1 DESC; frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation ---------------+-----------+-----------------+-----------+-----------+------------+-------+--------+-------------- 0.00026666667 | test1 | account_key | f | 0 | 249474 | 1 | 101 | -0.008081022 0.00026666667 | test1 | subscription_id | f | 0 | -0.6047012 | 1 | 101 | 0.26008433 (2 rows) Time: 7.760 ms testdb=> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation testdb-> FROM pg_stats WHERE attname in ('account_key','subscription_id') AND tablename='test2' ORDER BY 1 DESC; frac_mcv | tablename | attname | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation ----------+-----------+-----------------+-----------+-----------+-------------+-------+--------+------------- | test2 | account_key | f | 0 | 325385 | | 101 | 0.006174604 | test2 | subscription_id | f | 0 | -0.68929785 | | 101 | 0.123789854 (2 rows) Time: 1.423 ms testdb=> select relname,reltuples from pg_class where relname in ('test1','test2'); relname | reltuples ---------+-------------- test1 | 4.906772e+06 test2 | 6.346934e+06 (2 rows) Time: 0.746 ms testdb=> create statistics cs_contract_site_map_account_key_subscription_id(dependencies, ndistinct) on account_key, subscription_id from test1; CREATE STATISTICS Time: 7.918 ms testdb=> create statistics cs_contract_account_key_subscription_id(dependencies, ndistinct) on account_key, subscription_id from test2; CREATE STATISTICS Time: 1.558 ms testdb=> analyze test1; aANALYZE Time: 745.952 ms testdb=> analyze test2; ANALYZE Time: 606.735 ms testdb=> select st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies testdb-> from pg_statistic_ext st join pg_statistic_ext_data std on st.oid=std.stxoid; ERROR: permission denied for table pg_statistic_ext_data Time: 2.464 ms testdb=# select st.stxrelid,st.stxname,st.stxstattarget,st.stxkeys,std.stxdndistinct,std.stxddependencies testdb-# from pg_statistic_ext st join pg_statistic_ext_data std on st.oid=std.stxoid; stxrelid | stxname | stxstattarget | stxkeys | stxdndistinct | stxddependencies ----------+--------------------------------------------------+---------------+---------+-------------------+------------------------------------------ 5920481 | cs_contract_site_map_account_key_subscription_id | -1 | 1 2 | {"1, 2": 3552572} | {"1 => 2": 0.891333, "2 => 1": 0.997467} 5920675 | cs_contract_account_key_subscription_id | -1 | 1 2 | {"1, 2": 6173026} | {"1 => 2": 0.917700, "2 => 1": 0.997400} (2 rows) testdb=> explain select count(*) from test1 t1,test2 t2 where t1.account_key=t2.account_key and t1.subscription_id=t2.subscription_id; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=629329.20..629329.21 rows=1 width=8) -> Merge Join (cost=1.28..629329.15 rows=23 width=0) <<< here we expect extended ndistinct used to rows = much more value, but still use distinct(account_key)*distinct(subscription_id) Merge Cond: (((t1.account_key)::text = (t2.account_key)::text) AND ((t1.subscription_id)::text = (t2.subscription_id)::text)) -> Index Only Scan using test1_pk on test1 t1 (cost=0.56..263134.59 rows=4906790 width=61) -> Index Only Scan using test2_pk on test2 t2 (cost=0.56..309935.05 rows=6346754 width=59) (5 rows) Time: 4.307 ms