On Wed, Sep 17, 2014 at 7:11 PM, Josh Berkus <josh@xxxxxxxxxxxx> wrote: > Folks, > > Just encountered another case of critical fail for abort-early query > plans. In this case, it will completely prevent a user from upgrading > to 9.3; this is their most common query, and on 9.3 it takes 1000X longer. > > Maybe we should think about removing abort-early plans from 9.5? > Clearly we don't understand them well enough for them to work for users. > > Query: > > SELECT "categories".* FROM "categories" WHERE "categories"."user_id" IN > ( SELECT to_user_id FROM "tags" WHERE "tags"."from_user_id" = 53529975 ) > ORDER BY recorded_on DESC LIMIT 20; > > Here's the plan from 9.1: > > Limit (cost=1613.10..1613.15 rows=20 width=194) (actual > time=0.503..0.509 rows=20 loops=1) > -> Sort (cost=1613.10..1736.14 rows=49215 width=194) (actual > time=0.502..0.505 rows=20 loops=1) > Sort Key: categories.recorded_on > Sort Method: top-N heapsort Memory: 30kB > -> Nested Loop (cost=248.80..303.51 rows=49215 width=194) > (actual time=0.069..0.347 rows=81 loops=1) > -> HashAggregate (cost=248.80..248.81 rows=1 width=4) > (actual time=0.050..0.054 rows=8 loops=1) > -> Index Scan using unique_index_tags on tags > (cost=0.00..248.54 rows=103 width=4) (actual time=0.020..0.033 rows=8 > loops=1) > Index Cond: (from_user_id = 53529975) > -> Index Scan using index_categories_on_user_id on > categories (cost=0.00..54.34 rows=29 width=194) (actual > time=0.010..0.028 rows=10 loops=8) > Index Cond: (user_id = tags.to_user_id) > Total runtime: 0.641 ms > > And from 9.3: > > Limit (cost=1.00..2641.10 rows=20 width=202) (actual > time=9.933..711.372 rows=20 loops=1) > -> Nested Loop Semi Join (cost=1.00..9641758.39 rows=73041 > width=202) (actual time=9.931..711.361 rows=20 loops=1) > -> Index Scan Backward using index_categories_on_recorded_on > on categories (cost=0.43..406943.98 rows=4199200 width=202) (actual > time=0.018..275.020 rows=170995 loops=1) > -> Index Scan using unique_index_tags on tags > (cost=0.57..2.20 rows=1 width=4) (actual time=0.002..0.002 rows=0 > loops=170995) > Index Cond: ((from_user_id = 53529975) AND (to_user_id = > categories.user_id)) > Total runtime: 711.457 ms > > So, here's what's happening here: > > As usual, PostgreSQL is dramatically undercounting n_distinct: it shows > chapters.user_id at 146,000 and the ratio of to_user_id:from_user_id as > being 1:105 (as opposed to 1:6, which is about the real ratio). This > means that PostgreSQL thinks it can find the 20 rows within the first 2% > of the index ... whereas it actually needs to scan 50% of the index to > find them. > > Removing LIMIT causes 9.3 to revert to the "good" plan, as expected. > > This is the core issue with abort-early plans; they depend on our > statistics being extremely accurate, which we know they are not. And if > they're wrong, the execution time climbs by 1000X or more. Abort-early > plans are inherently riskier than other types of query plans. > > What I'm not clear on is why upgrading from 9.1 to 9.3 would bring about > this change. The stats are no more than 10% different across the > version change. Amusingly on-topic rant I happened to read immediately after this by chance: http://wp.sigmod.org/?p=1075 Is there a canonical case of where 'abort early' plans help? (I'm new to that term -- is it a recent planner innovation...got any handy links?) merlin -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance