Re: subselect requires offset 0 for good performance.

Scott Marlowe <scott.marlowe@xxxxxxxxx> · Fri, 2 Aug 2013 15:27:41 -0600

On Fri, Aug 2, 2013 at 2:51 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
> Scott Marlowe <scott.marlowe@xxxxxxxxx> writes:
>> Yep. Added the indexes and performance went right into the dumper. New
>> plan on new table with old data added in random order now looks like
>> the old table, only worse because it's on a slower drive. Just to be
>> complete here's the plan: http://explain.depesz.com/s/PYH Note that I
>> created new table with order by random() and created indexes. Ran
>> analyze on it, and the select plan looks similar now:
>> http://explain.depesz.com/s/bsE
>
>> So maybe I can make a test case now. But to summarize, when it can use
>> indexes this query gets REAL slow because it lacks a materialize step.
>> That seem about right?
>
> Well, the plans shown here could *not* use a materialize step because the
> inner scan makes use of a value from the current outer row.  The
> materialized plan has to omit one of the index conditions from the inner
> scan and then apply it as a join condition.
>
> I suspect the real reason that the fast case is fast is that the inner
> relation, even without the p.tree_sortkey >= pro_partners.tree_sortkey
> condition, is empty, and thus the join runs very quickly.  But the planner
> doesn't know that.  Its estimate of the row count isn't very large, but
> it's definitely not zero, plus it thinks that adding the additional index
> condition reduces the rowcount by a factor of 3 from there.  So it comes
> to the wrong conclusion about the value of materializing a fixed inner
> relation as opposed to using a parameterized indexscan.
>
> Have you tried increasing the statistics targets for these join columns?
> It's also possible that what you need to do is adjust the planner's
> cost parameters ...

I've tried changing random_page_cost, sequential_page_cost, the cpu*
costs, and setting effective_cache_size all over the place and it
stays just as slow.

our default stats target is 100. Did a stats target = 1000 on the
three cols we access. Same terrible performance. Plan here:
http://explain.depesz.com/s/XVt
stats target=10000, same bad performance, plan:
http://explain.depesz.com/s/kJ54 pretty much the same.  Setting
effective_cache_size='1000GB' make no difference, still slow.

If I set random_page_cost to 75 makes it work, i.e. a materialize
shows up. Note that we run on FusionIO cards, and the whole db fits in
memory, so a very large effective cache size and random page cost of
1.0 is actually accurate for our hardware.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance