Re: Nested loops overpriced

Peter Eisentraut <peter_e@xxxxxxx> · Wed, 9 May 2007 18:17:44 +0200

Am Mittwoch, 9. Mai 2007 16:11 schrieb Tom Lane:
> Well, there's something funny going on here.  You've got for instance
>
>            ->  Index Scan using email_pkey on email  (cost=0.00..3.85
> rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990) Index Cond:
> (email.email_id = eh_from.email_id)
>                  Filter: (("time" >= '2007-05-05 17:01:59'::timestamp
> without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without
> time zone))
>
> on the inside of a nestloop whose outer side is predicted to return
> 107156 rows.  That should've been discounted to *way* less than 3.85
> cost units per iteration.

This is the new plan with 8.2.4.  It's still got the same problem, though.

                                                                                        QUERY PLAN                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=5627064.21..5627718.73 rows=32726 width=184) (actual time=4904.834..5124.585 rows=35000 loops=1)
   ->  Sort  (cost=5627064.21..5627146.03 rows=32726 width=184) (actual time=4904.771..4947.892 rows=35000 loops=1)
         Sort Key: eh_subj.header_body
         ->  Nested Loop  (cost=0.00..5624610.06 rows=32726 width=184) (actual time=0.397..4628.141 rows=35000 loops=1)
               ->  Nested Loop  (cost=0.00..1193387.12 rows=28461 width=120) (actual time=0.322..3960.360 rows=35000 loops=1)
                     ->  Nested Loop  (cost=0.00..1081957.26 rows=28648 width=112) (actual time=0.238..3572.023 rows=35000 loops=1)
                           ->  Index Scan using dummy_index on email_header eh_from  (cost=0.00..13389.15 rows=280662 width=104) (actual time=0.133..1310.248 rows=280990 loops=1)
                           ->  Index Scan using email_pkey on email  (cost=0.00..3.79 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=280990)
                                 Index Cond: (email.email_id = eh_from.email_id)
                                 Filter: (("time" >= '2007-05-05 17:01:59'::timestamp without time zone) AND ("time" < '2007-05-05 18:01:59'::timestamp without time zone))
                     ->  Index Scan using mime_part_pkey on mime_part  (cost=0.00..3.88 rows=1 width=12) (actual time=0.005..0.006 rows=1 loops=35000)
                           Index Cond: ((email.email_id = mime_part.email_id) AND (mime_part.mime_part_id = 0))
               ->  Index Scan using idx__email_header__email_id__mime_part_id on email_header eh_subj  (cost=0.00..155.47 rows=18 width=104) (actual time=0.009..0.014 rows=1 loops=35000)
                     Index Cond: ((email.email_id = eh_subj.email_id) AND (0 = eh_subj.mime_part_id))
                     Filter: (header_name = 'subject'::text)
 Total runtime: 5161.390 ms

> Are you using any nondefault planner settings?

random_page_cost = 3
effective_cache_size = 384MB

> How big are these tables, anyway?

email		35 MB
email_header	421 MB
mime_part	37 MB

Everything is analyzed, vacuumed, and reindexed.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/