Re: Selectivity for lopsided foreign key columns

Mikkel Lauritsen <renard@xxxxxxx> · Thu, 17 Dec 2015 20:14:13 +0100

On 2015-12-17 16:23, Tom Lane wrote:
Mikkel Lauritsen <renard@xxxxxxx> writes:
The schema contains two tables, t1 and t2.
t2 has two fields, an id and a tag, and it contains 146 rows that are
unique.
t1 has two fields, a value and a foreign key referring to t2.id, and 
it
contains 266177 rows.
The application retrieves the rows in t1 that match a specific tag in
t2, and it turned out that the contents of t1 were distributed in a 
very
lopsided way, where more than 90% of the rows refer to one of two tags
from t2:
...
The estimate for the number of rows in the result (1824) is way too 
low,
and that leads to bad plans and queries involving more joins on the
tables that run about 1000x slower than they should.

I have currently rewritten the application code to do two queries; one
to retrieve the id from t2 that matches the given tag and one to
retrieve the rows from t1, and that's a usable workaround but not
something we really like doing as a permanent solution. Fiddling with
the various statistics related knobs seems to make no difference, but 
is
there be some other way I can make Postgres assume high selectivity 
for
certain tag values? Am I just SOL with the given schema?

You're pretty much SOL.  Lacking cross-column statistics, the planner 
has
no idea which t2.id goes with the given tag, so it can't see that the
selected id is the one that is most common in t1.  You're getting a
join size estimate that is basically size of t1 divided by number of
possible values (146), which is about the best we can do without 
knowing
which id is selected.

--- snip --

Thanks - I thought as much, but it's really nice to have it confirmed 
from
people who are way more knowledgeable.

Best regards and thanks again,
  Mikkel Lauritsen

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance