Re: Bad plan for nested loop + limit

Robert Haas <robertmhaas@xxxxxxxxx> · Fri, 27 Feb 2009 17:54:47 -0500

On Fri, Feb 27, 2009 at 3:18 PM, Alexander Staubo <alex@xxxxxxxxxx> wrote:
> On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex@xxxxxxxxxx> wrote:
>> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@xxxxxxxxx> wrote:
>>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@xxxxxxxxxx> wrote:
>>>>
>>>> Output from "explain analyze":
>>>>
>>>>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
>>>> time=61.554..4039.704 rows=1 loops=1)
>>>>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
>>>> time=61.552..4039.700 rows=1 loops=1)
>>>>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
>>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>>
>>> Those estimates are pretty far off. Did you try increasing the
>>> statistics target? Also, is the first query repeatable (that is, is it
>>> already in cache when you do the test, or alternately, are all queries
>>> *out* of cache when you test?)
>
> All right, this query keeps coming back to bite me. If this part of the join:
>
>  ... and section_items.sandbox_id = 16399
>
> yields a sufficiently large number of matches, then performance goes
> 'boink', like so:
>
>  Limit  (cost=0.00..34.86 rows=4 width=48) (actual
> time=4348.696..4348.696 rows=0 loops=1)
>   ->  Nested Loop  (cost=0.00..60521.56 rows=6944 width=48) (actual
> time=4348.695..4348.695 rows=0 loops=1)
>         ->  Index Scan using index_event_occurrences_on_start_time on
> event_occurrences  (cost=0.00..11965.38 rows=145712 width=48) (actual
> time=0.093..138.029 rows=145108 loops=1)
>               Index Cond: (start_time > '2009-02-27
> 18:01:14.739411+01'::timestamp with time zone)
>         ->  Index Scan using
> index_section_items_on_subject_type_and_subject_id on section_items
> (cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
> loops=145108)
>               Index Cond: (((section_items.subject_type)::text =
> 'Event'::text) AND (section_items.subject_id =
> event_occurrences.event_id))
>               Filter: (section_items.sandbox_id = 9)
>  Total runtime: 4348.777 ms
>
> In this case:
>
> # select count(*) from section_items where sandbox_id = 9;
>  count
> -------
>  3126
>
> If I remove the start_time > ... clause, performance is fine. Upping
> the statistics setting on any of the columns involved seems to have no
> effect.
>
> Is this a pathological border case, or is there something I can do to
> *generally* make this query run fast? Keep in mind that the query
> itself returns no rows at all. I want to avoid doing an initial
> "select count(...)" just to avoid the bad plan. Suffice to say, having
> a web request take 5 seconds is asking too much from our users.

The problem here is that the planner estimates the cost of a Limit
plan node by adding up (1) the startup cost of the underlying plan
node, in this case 0 for the nestjoin, and (2) a percentage of the run
cost, based on the ratio of the number of rows expected to be returned
to the total number of rows.  In this case, the nested loop is
expected to return 6944 rows, so it figures it won't have to get very
far to find the 4 you requested.

So when the LIMIT clause is a little bigger, or missing, the planner
tries to minimize the cost of the whole operation, whereas when the
limit is very small, it picks a plan that is much slower overall on
theory that it will be able to quit long before finishing the whole
thing.  When that turns out to be false, you get burned.

That means that the root cause of the problem is the fact that the
join is estimated to return hundreds or thousands of rows.  But it's
hard to think that you can make that estimate any better.  The
nestloop is expected to output 6944 rows, and the index scan on
event_occurrences is expected to return 145712 rows.  So the planner
knows that only a tiny fraction of the rows in event_occurrences are
going to have a match in section_items - it just doesn't think the
fraction is quite tiny enough to keep it from making a bad decision.

Interestingly, I think the solution Tom and I were talking about to
another problem in this area would make your case MUCH WORSE.

http://archives.postgresql.org/message-id/603c8f070902180908j3ae46774g535d96ece2c90e74@xxxxxxxxxxxxxx

I will think about this some more but nothing is occurring to me off
the top of my head.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance