Re: Interesting query plan change linked to the LIMIT parameter

Yannick Le Guédart <yannick@xxxxxxxxxxxxx> · Wed, 21 Jan 2009 08:37:50 +0100

Thanks for the rapid response.

I can understand the way the planner makes its guess, but as a matter of fact, he'll be nearly always wrong, just becausethe most commented articles have only around 5000 or so comments.  I ran the explain  analyze tonight and got this results :

EXPLAIN ANALYZE SELECT _comment.id,

        (get_comment_response(_comment.id)).id AS r_id
FROM   _comment

INNER JOIN _article
        ON _article.id = _comment.parent_id

WHERE  _comment.parent_id = '17355952'
ORDER BY _comment.id ASC

OFFSET 0
LIMIT 1;

                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..3588.42 rows=1 width=8) (actual time=498597.115..498597.116 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..28524312.40 rows=7949 width=8) (actual time=498597.114..498597.114 rows=1 loops=1)

         ->  Index Scan using _comment_pkey on _comment  (cost=0.00..28448324.73 rows=7949 width=16) (actual time=498473.360..498473.360 rows=1 loops=1)

               Filter: (parent_id = 17355952::bigint)
         ->  Index Scan using _article_pkey on _article  (cost=0.00..9.55 rows=1 width=8) (actual time=63.465..63.465 rows=1 loops=1)

               Index Cond: (_article.id = 17355952::bigint)
 Total runtime: 498615.230 ms

(7 rows)

EXPLAIN ANALYZE SELECT _comment.id,

        (get_comment_response(_comment.id)).id AS r_id
FROM   _comment

INNER JOIN _article
        ON _article.id = _comment.parent_id

WHERE  _comment.parent_id = '17355952'
ORDER BY _comment.id ASC

OFFSET 0
LIMIT 1000;

                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=10261.19..10263.69 rows=1000 width=8) (actual time=127.037..127.267 rows=1000 loops=1)
   ->  Sort  (cost=10261.19..10281.06 rows=7949 width=8) (actual time=127.036..127.128 rows=1000 loops=1)

         Sort Key: _comment.id
         Sort Method:  top-N heapsort  Memory: 95kB

         ->  Nested Loop  (cost=0.00..9825.35 rows=7949 width=8) (actual time=0.472..122.986 rows=4674 loops=1)
               ->  Index Scan using _article_pkey on _article  (cost=0.00..9.55 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=1)

                     Index Cond: (id = 17355952::bigint)
               ->  Index Scan using idx_comment_parent_id on _comment  (cost=0.00..9716.44 rows=7949 width=16) (actual time=0.235..32.869 rows=4674 loops=1)

                     Index Cond: (_comment.parent_id = 17355952::bigint)
 Total runtime: 127.410 ms

(10 rows)

As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our case, LIMIT 2).

Yannick. 

2009/1/20 David Wilson <david.t.wilson@xxxxxxxxx>

On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart

<yannick@xxxxxxxxxxxxx> wrote:

>

> The second query scans the whole comment table which is very dangerous for

> production servers.

That's not quite true. The second does an index scan- the planner

seems to be guessing that it'll fulfill the required limit early in

the index scan; only with a pathologically bad case would it actually

have to scan the entire thing. Basically, the second query is

optimized to spit out the first few rows quickly, since that's all you

asked for with the limit.

Note that your first query has a final cost estimate of "Limit

(cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated

10261.19 to emit the first row; the second has "Limit

(cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically,

instant) to emit the first - and only desired - row.

That all said, an explain analyze would give us a better idea of

what's going on- we can't tell if the planner is making bad estimates

without the knowledge of what the real timing and row count results of

plan stages were.

--

- David T. Wilson

david.t.wilson@xxxxxxxxx