Folks, Just encountered another case of critical fail for abort-early query plans. In this case, it will completely prevent a user from upgrading to 9.3; this is their most common query, and on 9.3 it takes 1000X longer. Maybe we should think about removing abort-early plans from 9.5? Clearly we don't understand them well enough for them to work for users. Query: SELECT "categories".* FROM "categories" WHERE "categories"."user_id" IN ( SELECT to_user_id FROM "tags" WHERE "tags"."from_user_id" = 53529975 ) ORDER BY recorded_on DESC LIMIT 20; Here's the plan from 9.1: Limit (cost=1613.10..1613.15 rows=20 width=194) (actual time=0.503..0.509 rows=20 loops=1) -> Sort (cost=1613.10..1736.14 rows=49215 width=194) (actual time=0.502..0.505 rows=20 loops=1) Sort Key: categories.recorded_on Sort Method: top-N heapsort Memory: 30kB -> Nested Loop (cost=248.80..303.51 rows=49215 width=194) (actual time=0.069..0.347 rows=81 loops=1) -> HashAggregate (cost=248.80..248.81 rows=1 width=4) (actual time=0.050..0.054 rows=8 loops=1) -> Index Scan using unique_index_tags on tags (cost=0.00..248.54 rows=103 width=4) (actual time=0.020..0.033 rows=8 loops=1) Index Cond: (from_user_id = 53529975) -> Index Scan using index_categories_on_user_id on categories (cost=0.00..54.34 rows=29 width=194) (actual time=0.010..0.028 rows=10 loops=8) Index Cond: (user_id = tags.to_user_id) Total runtime: 0.641 ms And from 9.3: Limit (cost=1.00..2641.10 rows=20 width=202) (actual time=9.933..711.372 rows=20 loops=1) -> Nested Loop Semi Join (cost=1.00..9641758.39 rows=73041 width=202) (actual time=9.931..711.361 rows=20 loops=1) -> Index Scan Backward using index_categories_on_recorded_on on categories (cost=0.43..406943.98 rows=4199200 width=202) (actual time=0.018..275.020 rows=170995 loops=1) -> Index Scan using unique_index_tags on tags (cost=0.57..2.20 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=170995) Index Cond: ((from_user_id = 53529975) AND (to_user_id = categories.user_id)) Total runtime: 711.457 ms So, here's what's happening here: As usual, PostgreSQL is dramatically undercounting n_distinct: it shows chapters.user_id at 146,000 and the ratio of to_user_id:from_user_id as being 1:105 (as opposed to 1:6, which is about the real ratio). This means that PostgreSQL thinks it can find the 20 rows within the first 2% of the index ... whereas it actually needs to scan 50% of the index to find them. Removing LIMIT causes 9.3 to revert to the "good" plan, as expected. This is the core issue with abort-early plans; they depend on our statistics being extremely accurate, which we know they are not. And if they're wrong, the execution time climbs by 1000X or more. Abort-early plans are inherently riskier than other types of query plans. What I'm not clear on is why upgrading from 9.1 to 9.3 would bring about this change. The stats are no more than 10% different across the version change. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance