On Fri, Aug 2, 2013 at 3:27 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: > On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> Scott Marlowe <scott.marlowe@xxxxxxxxx> writes: >>> Yep. Added the indexes and performance went right into the dumper. New >>> plan on new table with old data added in random order now looks like >>> the old table, only worse because it's on a slower drive. Just to be >>> complete here's the plan: http://explain.depesz.com/s/PYH Note that I >>> created new table with order by random() and created indexes. Ran >>> analyze on it, and the select plan looks similar now: >>> http://explain.depesz.com/s/bsE >> >>> So maybe I can make a test case now. But to summarize, when it can use >>> indexes this query gets REAL slow because it lacks a materialize step. >>> That seem about right? >> >> Well, the plans shown here could *not* use a materialize step because the >> inner scan makes use of a value from the current outer row. The >> materialized plan has to omit one of the index conditions from the inner >> scan and then apply it as a join condition. >> >> I suspect the real reason that the fast case is fast is that the inner >> relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey >> condition, is empty, and thus the join runs very quickly. But the planner >> doesn't know that. Its estimate of the row count isn't very large, but >> it's definitely not zero, plus it thinks that adding the additional index >> condition reduces the rowcount by a factor of 3 from there. So it comes >> to the wrong conclusion about the value of materializing a fixed inner >> relation as opposed to using a parameterized indexscan. >> >> Have you tried increasing the statistics targets for these join columns? >> It's also possible that what you need to do is adjust the planner's >> cost parameters ... > > I've tried changing random_page_cost, sequential_page_cost, the cpu* > costs, and setting effective_cache_size all over the place and it > stays just as slow. > > our default stats target is 100. Did a stats target = 1000 on the > three cols we access. Same terrible performance. Plan here: > http://explain.depesz.com/s/XVt > stats target=10000, same bad performance, plan: > http://explain.depesz.com/s/kJ54 pretty much the same. Setting > effective_cache_size='1000GB' make no difference, still slow. > > If I set random_page_cost to 75 makes it work, i.e. a materialize > shows up. Note that we run on FusionIO cards, and the whole db fits in > memory, so a very large effective cache size and random page cost of > 1.0 is actually accurate for our hardware. OK I've done some more research. Here's the plan for it with a low random page cost: Nested Loop Anti Join (cost=0.00..1862107.69 rows=9274 width=994) (actual time=0.181..183413.479 rows=17391 loops=1) Join Filter: (p.tree_sortkey <= tree_right(pp_test_wide.tree_sortkey)) -> Index Scan using pp_test_wide_tree_sortkey_idx on pp_test_wide p (cost=0.00..10108.15 rows=10433 width=983) (actual time=0.164..218.230 rows=17391 loops=1) Index Cond: ((tree_sortkey >= B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey <= B'0000000000010101000001000101010011111111111111111111111111111111'::bit varying) AND (tree_sortkey >= B'00000000'::bit varying) AND (tree_sortkey <= B'0000000011111111111111111111111111111111'::bit varying)) Filter: (deleted_at IS NULL) -> Index Scan using pp_test_wide_tree_sortkey_idx on pp_test_wide (cost=0.00..194.86 rows=14 width=11) (actual time=10.530..10.530 rows=0 loops=17391) Index Cond: ((pp_test_wide.tree_sortkey >= B'00000000000101010000010001010100'::bit varying) AND (pp_test_wide.tree_sortkey <= B'0000000000010101000001000101010011111111111111111111111111111111'::bit varying) AND (p.tree_sortkey >= pp_test_wide.tree_sortkey)) Filter: ((pp_test_wide.product_name IS NOT NULL) AND (pp_test_wide.tree_sortkey <> B'00000000000101010000010001010100'::bit varying)) Total runtime: 183421.226 ms Note that it's got 0 rows with 17391 loops on the bottom half, time of 10ms each. which adds up to just under the total cost in the nested loop anti-join of ~183,000 ms. If I crank up random_page_cost to say 100 I get this plan: Nested Loop Anti Join (cost=15202.10..144269.58 rows=9292 width=997) (actual time=71.281..271.018 rows=17391 loops=1) Join Filter: ((p.tree_sortkey >= pp_test_wide.tree_sortkey) AND (p.tree_sortkey <= tree_right(pp_test_wide.tree_sortkey))) -> Seq Scan on pp_test_wide p (cost=0.00..16009.92 rows=10453 width=986) (actual time=0.024..183.341 rows=17391 loops=1) Filter: ((deleted_at IS NULL) AND (tree_sortkey >= B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey <= B'0000000000010101000001000101010011111111111111111111111111111111'::bit varying) AND (tree_sortkey >= B'00000000'::bit varying) AND (tree_sortkey <= B'0000000011111111111111111111111111111111'::bit varying)) -> Materialize (cost=15202.10..15202.54 rows=44 width=11) (actual time=0.004..0.004 rows=0 loops=17391) -> Seq Scan on pp_test_wide (cost=0.00..15202.06 rows=44 width=11) (actual time=71.245..71.245 rows=0 loops=1) Filter: ((product_name IS NOT NULL) AND (tree_sortkey >= B'00000000000101010000010001010100'::bit varying) AND (tree_sortkey <= B'0000000000010101000001000101010011111111111111111111111111111111'::bit varying) AND (tree_sortkey <> B'00000000000101010000010001010100'::bit varying)) Total runtime: 272.204 ms Note that here we feed two seq scans into the anti-join and the speed of both seq scan is quite low. I've played around with the selectivity to where the inner scan delivers 53 rows instead of none, and the query performance was about the same for both plans (i.e. random_page_cost = 1 is super slow lots of loops of 10ms, random_page_cost = 100 is fast with two seq scans.). It seems to me that deciding to do the index scan is a mistake whether the lower portion of the esitmates 14 or 0 rows seems like a mistake, since it is going to run an index scan's worth of loops from the upper index scan. I.e. it should be guestimating that it's gonna do ~10,000 loops in the index scan query. It's like somehow the query planner thinks that each loop is going to be MUCH faster than 10ms. I can create an extract of this table for testing if someone wants one to explore with. Let me know. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance