Thanks for the suggestion. I created extended statistics objects for the two tables in question. Unfortunately the resulting plan was the same (and had the same estimates).
It looks like the extended stats discovered a potentially useful
correlation on bag: "2, 3 => 1" (owner_id, bag_type_id => id). I'm guessing this wasn't usable because the docs state "They are not used to improve estimates for equality conditions comparing two columns".
I created functional dependency extended stats (none of our queries use group by), and ran analyze. The resulting objects are below. For correlations of 1, the results seemed logically correct (I'm not sure how to interpret the .966 values). The limitations section said that extended stats are only applied for simple equality conditions, so I modified the query to use equality instead of any. That still resulted in the same plan and estimate. Just to be thorough, I tried with all permutations of zero, one or both stats objects. In all cases the resulting plan and estimates didn't change from the slow hash join.
create statistics bag_type_stats (dependencies) on id, name, game from bag_type;
analyze bag_type;
create statistics bag_stats (dependencies) on id, owner_id, bag_type_id from bag;
analyze bag;
select * from pg_statistic_ext;
-[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid | 16411
stxname | bag_stats
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 3
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 3": 1.000000, "2 => 1": 0.966567, "2 => 3": 0.966567, "1, 2 => 3": 1.000000, "1, 3 => 2": 1.000000, "2, 3 => 1": 1.000000}
-[ RECORD 2 ]---+------------------------------------------------------------------------------------------------------------------------------------------------------
stxrelid | 16398
stxname | bag_type_stats
stxnamespace | 2200
stxowner | 10
stxkeys | 1 2 4
stxkind | {f}
stxndistinct |
stxdependencies | {"1 => 2": 1.000000, "1 => 4": 1.000000, "2 => 1": 1.000000, "2 => 4": 1.000000, "1, 2 => 4": 1.000000, "1, 4 => 2": 1.000000, "2, 4 => 1": 1.000000}
For bag keys 1, 2, 3 are id, owner_id and bag_type_id. For bag_type 1, 2, 4 are id, name and game.
--Thanks
--Jeremy
On Fri, May 17, 2019 at 9:35 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
>
> Jeremy Altavilla <jeremyaltavilla@xxxxxxxxx> writes:
> > We have several select statements whose performance is greatly improved by
> > deleting some stats from pg_statistic.
>
> You might have better results by setting up some "extended stats" for
> the combination of bag_type columns that this query depends on. Per your
> description, there's a fair amount of cross-column correlation, which
> the planner will not expect without some extended stats to tell it so.
>
> https://www.postgresql.org/docs/10/planner-stats.html#PLANNER-STATS-EXTENDED
>
> regards, tom lane