Re: Selectivity for lopsided foreign key columns

Tom Lane <tgl@xxxxxxxxxxxxx> · Thu, 17 Dec 2015 10:23:44 -0500

Mikkel Lauritsen <renard@xxxxxxx> writes:
> The schema contains two tables, t1 and t2.
> t2 has two fields, an id and a tag, and it contains 146 rows that are 
> unique.
> t1 has two fields, a value and a foreign key referring to t2.id, and it 
> contains 266177 rows.
> The application retrieves the rows in t1 that match a specific tag in 
> t2, and it turned out that the contents of t1 were distributed in a very 
> lopsided way, where more than 90% of the rows refer to one of two tags 
> from t2:
> ...
> The estimate for the number of rows in the result (1824) is way too low, 
> and that leads to bad plans and queries involving more joins on the 
> tables that run about 1000x slower than they should.

> I have currently rewritten the application code to do two queries; one 
> to retrieve the id from t2 that matches the given tag and one to 
> retrieve the rows from t1, and that's a usable workaround but not 
> something we really like doing as a permanent solution. Fiddling with 
> the various statistics related knobs seems to make no difference, but is 
> there be some other way I can make Postgres assume high selectivity for 
> certain tag values? Am I just SOL with the given schema?

You're pretty much SOL.  Lacking cross-column statistics, the planner has
no idea which t2.id goes with the given tag, so it can't see that the
selected id is the one that is most common in t1.  You're getting a
join size estimate that is basically size of t1 divided by number of
possible values (146), which is about the best we can do without knowing
which id is selected.

One possibility, if you can change the schema, is to denormalize by
copying the tag field into t1.  (You could enforce that it's correct
by using a two-column foreign key constraint on (id, tag).)  Then the
query would look like
SELECT * FROM t1 INNER JOIN t2 ON t1.tag = t2.tag WHERE t2.tag = '<some_tag>'
and since the planner is smart enough to deduce t1.tag = '<some_tag>' from
that, it would arrive at the correct estimate for any particular tag.

			regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance