Selectivity for lopsided foreign key columns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I have an application that runs in production in multiple instances, and on one of these the performance of certain queries suddenly became truly abysmal. I basically know why, but I would much appreciate if I could obtain a deeper understanding of the selectivity function involved and any possible means of making Postgres choose a better plan. In the following I have tried to boil the problem down to something manageable:

The schema contains two tables, t1 and t2.
t2 has two fields, an id and a tag, and it contains 146 rows that are unique. t1 has two fields, a value and a foreign key referring to t2.id, and it contains 266177 rows.

The application retrieves the rows in t1 that match a specific tag in t2, and it turned out that the contents of t1 were distributed in a very lopsided way, where more than 90% of the rows refer to one of two tags from t2:

EXPLAIN SELECT(*) FROM t1 WHERE t2_id = '<some_id>'

Index Scan using t1_t2_id_idx on t1 (cost=0.42..7039.67 rows=103521 width=367)
  Index Cond: (t2_id = '<some_id>'::text)

The row count estimate is exactly as expected; about 39% of the rows refer to that specific tag.

What the application actually does is

EXPLAIN SELECT(*) FROM t1 INNER JOIN t2 ON t1.t2_id = t2.id WHERE t2.tag = '<some_tag>'

Nested Loop (cost=0.69..3152.53 rows=1824 width=558)
-> Index Scan using t2_tag_idx ON t2 (cost=0.27..2.29 rows=1 width=191)
        Index Cond: (tag = '<some_tag>'::text)
-> Index Scan using t1_t2_id_idx on t1 (cost=0.42..3058.42 rows=9182 width=367)
        Index Cond: (t2_id = t2.id)

The estimate for the number of rows in the result (1824) is way too low, and that leads to bad plans and queries involving more joins on the tables that run about 1000x slower than they should.

I have currently rewritten the application code to do two queries; one to retrieve the id from t2 that matches the given tag and one to retrieve the rows from t1, and that's a usable workaround but not something we really like doing as a permanent solution. Fiddling with the various statistics related knobs seems to make no difference, but is there be some other way I can make Postgres assume high selectivity for certain tag values? Am I just SOL with the given schema?

Any pointers to information about how to handle potentially lopsided data like this are highly welcome.

Best regards,
  Mikkel Lauritsen


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux